THE ANNALS 
of 

MATHEMATICAL 

STATISTICS 

(rOONDBD BT H. C, CAHVEh) 

The Official Journal of the Institute of 
Mathematical Statistics 


VOLUME xm 


1942 



THE ANNALS 

OF MATHEMATICAL STATISTICS 


EDITED BY 

S. S. WILKS, BdUor 

A. T. CRAIG J. NKYMAN 


H. C. Carver 
H. Cramer 
W. E. Demind 
G. Dahmois 


WITH THE COttPEIiATlON OP 

R. A. FianER 
T. C. Fry 
H. Hotelling 


R. VON Mw»s 
K. 8. Fr.Ai»w 
H. L. R,i»Ta 
VV. A, BHmwHAVt 


The Annals of Mathematical Statistto is puhlshcd quarterly by tli 
Institute of Mathematical Statistics, Mt. Royal <fe Guiiford Avea., Ealtimwi, 
Md. Subscriptions, renewals, orders for back numljem and other biwifiCfw ee®a» 
munications should be seat to the Annai^ of Mathematical Btati«tic», Mt 
Royal & Guilford Ayes., Baltimore, Md., or to the Secretary of the Irw^ 
tute of Mathematical Statistics, E. G. Olds, Camqgio lastituto of Ttrlmolo©, 
Pittsburgh, Pa. Changes in mailing address which are to beoomn nffective fof 
a given issue should be reported to the Secretary on or before the IStli of the 
month preceding the month of that issue. The months of ‘mm aro March, 
June, September and December. 

Manuscripts for publication in the Annai-s or Mathematical BrArtmm 
should be sent to S. 8. Wilks, Fine Hall, Princeton, New Jewy. MBftuwripte 
should be typewritten double-spaced with wide margins, and the orii^nat W|f 
should be submitted. Footnotes should be reduced to a minirautn and wheniw* 
possible replaced by a bibliography at the end of the paper; formulae in foot* 
notes should be avoided. Figures, charts, and diagrams should he drawn M 
plain white paper or traemg cloth in black India ink twice the sla® they are t» 
be printed. Authors are requested to keep in mind typographical diffioutHW 
of complicated mathematical formulae. 

Authore will ordinarily receive only galley proofs. Fifty reprinta wilbwit 
covers will be furnished free. Additional reprints and covem fumiahod at ewh 

The subscription price for the Annals is $6.00 per year. Singl® ll.ifc 
Back numbers are available at $6.00 per volume, or $1,60 per dngle Iwue. 


Composed and Printed at the 
WAVERLY PRESS, Ino. 
Baltimore, Md,, U. S. A, 



CONTENTS OP VOLUME XIH 
Articles 

Anderson, Paul IL Distributions in Stratifiejd Sampling.. 

Anderson, R. L. Distribution of the Serial Correlation Coefficient * 

Battin, I. L. On the Problem of Multiple Matching,. .... . 

Bernstein, S. (Translated by Ernma Lehmer), Solution of a Mathe¬ 
matical Problem Connected with the Theory of Heredity ..... ® 

Burr, Irving W. Comraulative Frequency Functions. 

Camp, Bdrton H, Some Recent Advances in Mathematical Statislicj^, I 

Chung, Kai-Lai. On Mutually Favorable Events. 

Craig, C. C. Some Recent Advances in Mathematical Statistics, II. * 

Dwyer, P. S. Grouping Methods. b18 

Gbiringer, Hilda. Observations on Analysis of Variance Theory. 'tSQ 

Hart, B. I., and von Neumann, John. Tabulation of the Proliabilitifw 
for the Ratio of the Mean Square Successive Difference to the Variaucp iWf 
Hasel, A. a. Estimation of Volume in limber Stands by Strip Sampling I #9 
Kimball, Bradford F. Limited Type of Primary Probability Dwtri- 

bution Applied to Annual Flood Flows. . 

KioopMANfl, Tjalling, Serial Correlation and Quadratic Forms in Normal 

Variables. U 

Lonskth, A, T. Systems of Linear Equations with Cocfficienta Hubji»rt 

to EiTor. 33.2 

IA)TKA, Alfred J. The Progeny of an Entire Population. IW 

Mann, H. B., and Wald, A. On the Choice of the Number of IntervslH 

in the Application of the Chi Square Test. 3fJ<l 

Samuelson, P. A. A Method of Determining Explicitly the (’oeffipiente 

of the Characteristic Equation.. 42-1 

Satt’ERthwaite, Franklin E. A Generalized Analysis of Variance , 34 

Satterthwaite, Franklin E, Linear Restrictions on Chi-Squarti , 
Sattbrthwaitb, Franklin E. Generalized Poisson Distribution ... lit} 

ScHEFFfe, Henry. On the Theory of Testing Composite Hypothws with 

One Constraint. SB 

ScHEFFife, Henry. On the Ratio of the Variances of Two Nomuil Popii. 

lations.. , s/j 

Stephan, Frederick F, An Iterative Method of Adjusting Sample tV.- 
quency Tahiti when Expected Marginal Tables are Known , . 

vonMisbs, R. On the Correct Use of Bayes’Formula.. 1 ®^ 

VON Neumann, John and Hart, B. I, Tabulation of the PmhahitUiw 
for the Ratio of the Mean Square Successive Difference to the Vari¬ 
ance... W 

ill 















iv 


CONTKfm? OF VOMJMB XI!I 


Wai.d, Abbaham. Asymptotically Bhortcst Confuicnct* IiifiTvaK 127 

Wald, A. A., and Mann, IL B. On the Choice of tlu* Numlwr of Infer- 
vals in the Application of the Chi Hqiiare Twt . . fiCXl 

Wald, Abhaham. Setting of Tolerance Limits when the Baniple b, 

Large. . 3^19 

WiLKB, S. S. Statistical Prediction with Hp»*eial Refercnre fft the Prohlern 

of Tolerance Limits.... - . 4W 

WoLFowm, J. Additive Partition Functiona and a Class «f Htatwtieat 
Hypotheses. ■ . . ■, M7 


Noteh 

Beckenbach, E. F, Convexity Properties of CcneraliOTi Mtm Vafutf 

Functions. . ® 

Bihnbaum, Z. W. An Inequality for Mills’ Itatio,...... > . 24S 

Curtiss, J. H. A Note on the 'llieory of Moment flenerating Punrlions 4^ 
Dieulefait, Carlos E, Note on a Afethod of Kampling. 04 

Fischer, Carl H. A Sequence of Discrete Variables Exhibiting tVfffpIa» 

tlon Due to Common Elements... 07 

Geiringer, Hilda. A Note on the Ihuhability of Arbitrarj' Evenlw 238 

Gihshick, M. a, Note on the Distribution of Roola of a Polynomial with 

Random Complex Coeffleionta. 235 

Gihshick, M. A. A Correction.. . 447 

Hart, B. I. Significance I.revels for the Ratio of the Miwi Hquan’ Huctw* 

sive Difference to the Variance.». 44$ 

Lukacs, Eugene. A Characterisation of the Nomml Dmtrilnition ,. 01 
Mann, Henry B. The Construction of Orthogonal lalin Btiuarea , . 418 

Paulson, Edward. An Approximate Normaliaation of the Analyaw of 

Variance Distribution. 233 

Paulson, Edward. A Note on the Estimation of Some Mean Valuw for 

a Bivariate Distribution. 44 D 

VON Neumann, John. A Further Remark Concerning the Ilistritnition of 
the Ratio of the Mean Square Succeasive Difference to the Vari- 

..... 86 

Wald, Abraham, On the Power Function of the Analv«i« of Variatica 
Test.... 484 

MiSOBUjANKOUH 

Abstracts of Papers. j j I 

Annual Report of the Secretary-Treasurer of the Inatitute.. * IffiT 

Report of the Dallas Meeting of the Institute... < ,,,, IQQ 

Report of the New York Meeting of the Institute... 102 

Report of the Poughkeepsie Meeting of the Institute.. .. 448 



















THE ANNALS 
of 

MATHEMATICAL 

STATISTICS 

a. c. OAairaB} 

Tub OBHGtAJb JotrBNAx. ov rim ImnvtfrB 

OF MAT^nSMATICAli S1^AT£mC& 

Conteitts 

Distrihui^on of the Serial Corrolaflon Coefficient. R. f,. AsvoEitacm, 

and Quadratto Fortna in Normtil Variuhtesf. 
TjAi<i;,.m!0 Koomu^a.... 

A Oen^taliscd Analyirfa of Variance. PaAMictjtnf R. SAiiy 

VRlKT^Ami.......' 

I^latrihaidfittai In Stratlfwjd SampHne. II. AK»i;i»»» ....... 

Solatia of a Matheniittical Prohlejn Connected with the I’heory 
Of Jieredity, . S. BmtaiiirEm (Tcanalated by Kmma Lanaiifjit)*, 

Some Roceni AdVtitiMC in MatheOMittcal Btatiatiee, L 
U* C/AJHtP'* 

Some Recent AdvanciHi in Mathematic^ StatadUs*^ 11. C. C 
Notes: 

A Farther fUoitarkCOnw^ l>ietnhatjoa of the RaMo of thr 

l3i|fep«siiii^; lo tlkoi! 

Comity Prmpwli«8 of f^netalijMti Mwah Vaiue Fttocdoiwa. R. T, 
S»ii(j*B««4ea,......... 

Jfetport of the New York Meeting of ti»« fmtitnie.. 

Report’Of thoRMliih IMhiotinif of ,tho t'e^ .' 

Atmtiai Reirort of tfm of the Ii^titote.. 

A^atraets of Papem. ,''' 


e»W 

1 


14 

4g 

74 


HI 

U 

m 

tm 

w, 

iw 


X!II1«'No.' 1 ~~ Mmrcii^- 






THE ANNAUS 

OF MATHEMATICAI, STATISTICS 


lit 

S. R WlIJvS, 


A. T. C'RAKl 


X XfvVMA!^ 


a a cahvkh 
H, Ciunm 
W. K DiaiiNo 
G. Dawhoi# 


WITH TifB r«npEl»AW:*V m 

It. A. 

T. Ffty 
II. lUrn.ujxu 


it VOf( Mis; fit 

I'., R 1*1-, 

H L. Jiii.Tz 
W. A. Kut.WItAUT 


MATtttiMATnju. STATisrics tf jiiifitMarirrlv tW 
of MatlK5mal,ki.I Ml lloynl A CiMikbrd Lc H-4tiL ^ 

Mt|. Subacnptioius, reimwals, tirdeo, for Iwk ntind-t-r^ ’ 

SWWk., K. q. OM., ... „f 

if^ *» R'liftary «i or ir„< j^if, ,>i ci„. 

ppecetUug ,tho montt. of tkit (mm, Tim nf \ . . 

JiJ«o, tteptei»bor and l)«eii»lipr. ^ 'r-Hiii, 

ManiMjaaptii for piihlicatioii Jn eho AsvAf,« of Math. 
abouM be amt to 8. R WiUb, Fun* H»H I’timmiJi ‘v.!I' l \ -^tA'iwt. m 

fWn White paper or teacinir Lth {nil { »'«l diagfanw should he draa n wi 

vrfu wSsSr^ without 


voiiium, i*r si.ai j,n,. 


/, ' f ' •' 


**“ PKCmtSt) at ■*•«» 

fAmmY mrm, imc, 

■' 'B*ii®wo»», Mp.:, U. Bf. A. ' 






'.►T '‘"■^ “' '• 

/ ' v ,, i''^'^ A •■r\' 






DISTRIBUTION OF THE SERIAL CORRELATION COEFFICIENT 

By II. L. ANPKneoN 
North Carolina A'ta/e College, 

1. Introduction. The j)rol)lf‘m of .serial eorrolation was brought, to tlic atUsi- 
tioii of .stati.sticiiui.s by Yuk* in 1921 [9] Botli Yule and Bartlett [2] have shown 
that tli(‘ ordinary teata of aigniticanco arc invalidated if surciwive oljwrvatioriK 
are not indojx'ndont of one another, d'he aerial eorrtdation eoefficient has tjoen 
introduced an a measure of the. relationahip Initween Kum-asive values of a 
variable ordered in time or space. Interest in the serial correlation problem was 
.stimulated further by the new' coneepls of time serie.s analy.sis discussed by 
Wold [8]. 

We .shall define the serial eorndation ctM'fficient for lug L and N observations 
to Ix' 

^ din _ + YjYui + • • • + X>,x,. - (SY,)VA^ 

- {ZX.f!N 

where C and F are tlu' covariance and vanunn' rcsiu'ctively and lln' X’n are 
considered to Ik* indeiH'iulently normally distributed alsmt tlu’ same mean with 
unit varianccr.* If the iHjpiilation varianee wen* known a priori, t1ie variates 
could Ik‘. Iransfonned so that they would liave unit variimce; under sueh an 
unusual circuirmtauce, tlie only distribution requin*d v.’ou!d Is* that of the serial 
covariance. Tintncr has given a twt of significance for tlie W‘rial covariance (til 
and for tlic! correlation cwfficieiife (7] by using a metliod of wdecteil items. 'I’lie 
author has pre.sented the divStrilnition of the serial covariance and of the serial 
correlation coefficient not corrected for tlie mean in a n*ccnf doctoral thesis (Ij, 
The distrilmtiona of ,M.v not correcUal for tlic mean will !«* mentioned in the 
sections which follow. 

2. Small sample distributions for lag 1. W. (}. t’ocliran has Rugg(*stc‘d that 
we use a result given in ids article on quadratic forms to derive tlie diKtriiiutions 
of tlie serial correlation ccH^nicient for small samples [3]. If , Y's, • • ■ , Xm 
are independently normally distributed with variance 1 and mean 0, tiien 

f 

"Every quadratic form Xa„XiX, is distributed like 52 kiiu, where r is the 

rank of the matrix, A, of the quadratic form, tlie u's an* inde{x*mlerttly 

distributed as x^ uacli with 1 d.f., and tlie X's are the non-wto lattuvt wstfi 

of the characU'ristic equation of A’* (3, p. 179). 

If eacli X, appears ki time.s as a latent root, w, wdll lx* distributed aa x’ with 
degrees of freedom, 

‘ Thie circular definition of the serial correlation coefficient WM suggested by H. 
Hotelling. 


1 



2 


R. h. ANDBRHON 


If we set L = 1 in the above definition of the, Hcrial rovariance, we note that 
the characteristic equation of iCy is 


if'y 


Ol 

Ol 

at 

Oy 

Oy 

aq 

ch 

• •■ fly- 1 ‘ 

■ 

• 

• 

... . ' sx a, 


0| 

Oi 

Oi ; 

aa 

fly =» 

(N 

— 2)/2A^, and all other a'a «= — l/JV, 


y f y _ 1 

iF„{\) = TJ < 2 a.o)* ' >, where wt is the Alh root of unity. Hence, 

JC-l l.-l J 

y-.i ^ 




Since 


y-i 




~(w* + 1 + w*') 

.(A^-3), 

i^if = n {~x* +(«*+&)*'*)/2i ®» IX X*+cos 


y~i 


JTmI 

2irfc 


Hence X* = cos , (fc = 1, 2, > • • , — 1), and 


iCy = 


I (Af-l) 
y i Xfc Vik ) 

X, X*M* - U, 


for k N 
for 4 ' = 

=■ 0 . 


for // odd, 
for even, 


where w* is distributed as x'* with 2 d.f. and u with 1 d.f. At the same time, 
we note that Vy = S(X, — is distributed as x* with A'’ — 1 d.f. 

The general procedure in deriving the distribution of ii2y is as follows: Wo 
determine the joint density function of the u’s which form the distributions of 
iCy(- liJyVy) and Vy , The u’s are integrated out, leaving tlie joint density 
function of iI2y and Fy. The distribution of ifly is obtained by integrating 
with respect to Fy from 0 to ». As examples, derivations of the. distributions 
of iRt and iRy have been included. In order to simplify the riisults, the first 
subscripts have been dropped from i 2 ?y. 

Distribution of Rt. RiVt — XiUi + X 2 U 3 — u and Fj » Uj + Ui + «, where 
Ui and u, are distributed as x' with 2 d,f. and u wth 1 d.f. and Xi » | and 
X 2 = — Hence the density function of the w's is 

D(ui, Ui ,u) = {4\/27r)~^u“*c~*’"*. 





8 EHIAI> CORKELATION COEmCJENT 


a 


5ince Ui = [VsC^s — + w(l + X*))/(Xi — Xj) and 

t<3 (ysCXi Re) — u(l + Xi)]/(Xi “ Xj), 


u muHt vary Ix'twwn 0 and F«(Xi — R»)/(,l + Xj) for Xi < jR« < Xi and fR'lwwn 
refXi - i?8)/(l + Xs) and F«(X, ~ «*)/(! + X,) for ~1 < Re < . Aff^’r 

integrating with rea^xict to u Wtwwn thean lirnita and then witli rwfjert to 
from 0 to !«, we obtained the following density function for Re ; 


D{Re) 


V (1 + X.) (Xt - X,) 

_V(xp"f,)_ 

t %/(! + Xi") (Xi - xJ 


+ 


_ \/(Xi -/i,) 

(1 + Xs) (X* — Xi) 


for X» :£ ft < X, 
for ~1 < ft ^ X,, 


The cumulative protiabiUty function has the aamc general form: 

'\/{l + Xi) (Xi — Xj) ■>/(!+ Xj) (Xj •" Xi) 

(Xi >- /eo' 

y (1 + X.) (X, - X,) 

DiUnbution of ft, ftft XiUt + Xj)r« + Xju» and V7 a w, -f «, -p u» > 
where each u is distributed os x* with 2 d.f. Hence, 


P(ft > R') 


for -1 < H' < Xt 
for Xi < R' Xt 


Ml 


V‘j(ft — Xj) + Uj(Xj “ Xj) j 

- K-x.)'.. 


ft(X, - ft) - «,(X, - X,) 
(X, - Xj) 


For Xi < ft < X,, 0 < u, < FtCX, - ft)/(Xi ~ X*)3 for X, < ft < X,, 
ftfXj — ft)/(Xi — Xj) < M» < VjCXi ~ ft)/(Xi — Xj). Using thtwe limite, we 
derived the follovdng density function for ft : 


i)(ft) 


(Xi ft) ,_ ^Xj “* Ri) 

(Xl — Xs) (Xl — ^1) (Xj •— Xi) (Xj — X») 

(Xi •— ft) 

^(Xi — Xj) (Xl — X|) 


for Xl < ft < X* 
for Xt^Rt£ \u 


The cumulative probability function is sirailar, except that the eo«‘fllcicnt 2 
cancels and the exponent of each numerator is raised by one. 

General formiUm for N odd. It appears that the density function for Rtt and 
Vh iot N odd is 


D(ft, Fj,) - 6“*’'^ g (X< ~ for X«+-, 

Uv-i) 

where a, = II' (X, - X,) for j 9 ^ t md 1/ft « 3)|, This 


* Note that wo are omitting the lag subscript from iRs* 



4 


R. L, ANDErUKD-V 


formula holds for = 5 and 7; wp will show that it holrls for .V + 2, jissuraing 
MoTn. If wo sot 4 1),- ie.vr.v + and V... == 

Vif + Uk\ hence, 

R„ = and V>f ^ tV.s - «(.. 

If we make the substitution u* = ui[Fw+*, the dpnsitv function for n*., 1- ■•ti, 
and R/f+i is 

iKFj^vv’ £ t(Xi - w “ v«.. 



In order to obtain the chstribution of Fw+i and Rfz+t , we must intPKrate out w* . 
The limits of integration differ for different values of m. Wc note that 


Uib = {Rif ~ Rit+i)/iRN — 

except that ui s Q when \k < Rh < Xm+i, since Xn,+i Rn-kt ^ X* and Ut 
can not be negative, For R„+i > X*, ni < 1; hence, if Rn is replaced by a 
larger (smaller) quantity, ul will be larger (smaller). 

For m = 1(X2 < Rn+i < Xi), we need to consider only that region for which 
X 2 < Bjf < Xi. In this region, 0 < ut < (Xi — Rie+z)/(ki Xt) and the density 
function of B ^^+2 and 7^+2 is 



SERIAL CORRELATION COEFFICIENT 


<I>{V~ *V^1) 

Where ^(F^v+s) = Fjl^-i^^r'’-TlKA^ - 1)] andal - fl (X, -- X,). 

I-S 

For m = 2(X3 < Rf /+2 < Xa), wc must eonsiiler two regions in the Rs> plane. 
When Xa < < Xi, 

Xa — Rfi-n ^ Xi /iwis 

Xa — Xt ~ * ~ Xi — X* ’ 

and when X 3 < i2jv < Xa, 0 < ii* < (Xa — i?«t 2 )/(Xa — Xi). If we comlnne the 
density functions for these two regions, wc find that 

a 

DiRy+i 1 F^i+a) “ </>(Fa^( 2 ) 23 (X> -Ky+a)^* V®v for Xi ^ f^.vja ^ Xa . 

1-1 

Similar results can be obtained for the other regions. 

Finally we conclude that for N odd, 


D{^Ry) = liN - 3) E (X. - 

IomI 


XmH l/i,V ^ 


Pijiy > Zi!') - E (X. - ze')*‘^^’V«. 

t-i 


for Xmu 5 Zd^ < Xm , 


where a, == (X. - X,), a 5 ^ j. The general density function fur W tidrl and 

j-i 

iZ?y not corrected for the sample moan is [1] 

lOV-l) 

Z)(iZZ;v) =KAf - 2 ) E (iZdi. - X.')‘'^“%.' for X« < ./d.v < X..,. 


1 / _ 

where a, = H' (X, - X.)\/ (1 - X,), i 9^ j. 

2—1 

General formulas for N even. Using the same method as abov(‘, we can show 

that the same formulas hold for W even and ^Z^^ corrected for the mean except 
1(^-2) _ _ ' 

that in this case a, = (X. — X,)V(X, + 1 ), j 9 ^ i. No gwieral formulas 

;»»1 

were derived for N even and iR^ not correc.ted for the mean, 

3. Large sample distributions for lag 1, The, slmultaneoUK density fnnclion 
of C and F, where we will drop the Hulwcripta for convenience, is 

D(C, V) = (2^)"“ r r<f>(e, t)e-'=~*UndU 

*vCO *^00 

0 (s, t) = K f • •. f e-^’dXidXi . > • dXy, 

J—eo *t-eo 



6 


R. t- AN»r.atM>K 


where 6 = {SZj ~ 2/l2(X. - .t)’l - 24}X,.V: ■+ • i X«X, ' X! • 

and s and t are pure imagiEiarieR. 

ij>{s, t) = A“*i where A ia the determinant nf the fpsHdrati*- * ’Jin^ 
determinant was evaluated by the methca} nf eirenlanfA. «r | rhjjf ^ - 

V-l 

II {1 — 2(t 4- s\i)), where X* eos 2^k/.V. ^ 


Sets a 

M-l 


log4>U, 1} » £*„ 



If K is p>r}i»nd'»<l tn wnc^, wr ftj»4 that *, -• 


ml2'" 22^*1 where m >= (t 4- i ~ I). For A* > >, we might mde ate 

summationB: SX* = ~1, « K.Y — 2), SXj -1, 2;xJ • St 

2A* = —1, Hence *cia *> B(C) » — 1 , a^i « *,V - t *r • ' 

(N — 2), K(U = <rl >= 2(N ~ I), «ti » /Hftffr <« —2, *jr ->■ S, »„,[| “ Hi.V I t, 
K41 = i(N — 2), (ill >= -8, etc. 

If we let = C 4" 1 and y es V „ — ih ail rd (hrm- ’man ijuvjnrsntiji 

will remain unchanged except that AM « a, o. J'infe/f • f I*. 


1 \ _ C'iN- ~ i) + V' 

A - V “ (y 4- 


C'(X - 1) + V' 
iN - 1)> 




If we neglect terras of order less than 1/JV, S(R) » -1 /(A --Is A J? t* 

~ '^)* “* 0 for fc > 2. For A' < 75, a mntt' ptWiCt a|«j*mstl* 
mation may be desired. 

If the above approximation ia used, m normally dialnb»k**l w»!|} n»m 
-1/{N - 1) Md variance {N - 2)/(A » 1)*. The wngJr4a,l 
pomta can be found by substituting in the formulas 


,jBj,(.06) = IM5V(N - 2) 
A - 1 


or itRjif(.Ol) sa 


I i 2 

.V - ) 


Refer to Fig 2 for a comparison of the exact distribution ami the normal 

nnnrrtvtm f ^ might noUl & fcw' eom(»ri#«»tPi Iw’iMwn iJir 

approximate significance points and the exact ones; 


N 

Positive tail | 

N«pidv« tai! 

s% 

1% 

8% ■ I 

Esaos ^ Approx 

“0.3515'' -0.309 
“0.276' »0.W2 


Exact 

Approx, 

Exact 1 

Approx. 

Exact 1 Approx, 

45 

75 

0.218 

0.173 

0.223 

0.176 

m 

0.324 

0.266 

-0.202 -^o.m 

“0.199} “0.203i 




.SERIAL CORRBLATIOK COEFFICIENT 


7 


For iRff not corrected for the mean, it was found that y = /j/ 
asymptotically normally distributed with mean 0 and variance 1 [1]. 

4. Significance points of iR^ . An example of the methods used in tabulating 
tlicso significance points has been presented in the author’s doctoral thesis [1], 
'rh*‘ significance points for the values of N enclosed in parentheses have been 
obtainiHl by graphical interpolation. Note that N is the number of observations 
Table IS. 



fi. I>iKt 3 !ibutions for general lag, L, (a) Introduction. For a general lag, k 
the corwtanta in the characteristic equation for the covariance iCh mg o, - 
- (X Hh 1/-V). Ot+i * =» (^ - 2)/2iV and all other a’a = -1/N. Hence 

tht? characteristic equation is 

“ fl (X* - cos (2TLk/N)] « 0. 

Certain impeortant gcncralizatioim concerning may be set down; 

j. When L i« not a factor of N or has no common factor wth 7^, • 

2 When L and N have a common factor, a, iFn ^ iiPnM O' b ■ 

2a, If « « L, aTm - (iFp)^(X - 1)‘“\ where p = N/L^ 

The proof of the first statement was suggested by Cochran, bmee 
cos (a + 2air) « cos «, where a is any integer we must prove that the senes ol 

numbers 




8 


K. I« AS'MKttSON' 


L,2L, ,uV - I)^>. 

when reduced modiiluH N can he HrrungctJ hi ftirm the M-rief 

1, 2, ■ ■ • , (A’ » 1). 

This proof can be found in most books on the theory of murd*ei>;e fc'. '1^ fli ru’* 
we conclude that each term of the Hcriuenee jens (2wLk A')! rerlijre^ h. 

TARLK I 


N 

Positive tail 

6% i 

Xi’KatU'*' 

5'';. 

• aiS 

V , 

5 

0.253 

0.297 ' 

~ 0.753 

^ 0.7‘.*H 

G 

0.345 

0. 1-17 i 

O.TOK 

o.Hia 

7 

0.370 

0.610 ‘ 

0.1174 

0.799 

8 

0 371 

0.531 

0.G25 

0.76 i 

9 

0.3G6 

0.533 j 

0.593 

0.T.37 

10 

0.300 

0.525 

0.5(14 

0.705 

11 

0.353 

0.515 ■ 

0.539 

0 »179 

12 

0.348 

0.505 ( 

0.511) 

0. <>.5.5 

13 

0.341 

0.495 j 

0.497 

0.631 

14 

0.335 

0.485 

0.479 

0 61.5 

16 

0.328 

0.475 

0.4(i2 

0.597 

20 

0.299 

0.432 

0.399 

0..521 

26 

0.276 

0.398 

0.3.56 

0.173 

30 

0,257 

0.370 

0.32.5 

0,133 

(36) 

0.242 

0.347 

o.:ioo 

0.40! 

(40) 

0.229 

0.329 

0.279 

0.3Tti 

46 

0.218 

0,314 

0.262 

0.356 

(50) 

0.208 

0.301 

0.248 

0.339 

(55) 

0,199 

0.280 

0.236 

0.324 

(60) 

0.191 

0.278 

0.225 ! 

0.310 

(65) 

0.184 

0.268 

0.21(5 ' 

0.29K 

(70) 

0.178 

0.259 

0.207 ! 

0.2H7 

75 

0 173 

0.250 j 

-0.199 

-0.276 


1}, when L;A 


to one of the sequence (cos (27rfc/A)) for it = 1, 2, ■ • ■ , (A 
is a prime fraction. 

If L and A have a common factor, «, L = ga and A =» whert* » tuul « 
are integers prime to one another. Hence, 

" H f“ S (^* “ y)" ~ ' 

= (iPp)"(X - ir-‘ = 0. 

If « = L, tF^ = (i^’p)^(X - where p = N/L, 



SERIAL CORRELATION COEFFICIENT 


9 


When these re.snlts are applied to the large sample di.slrihution of Jtv , we 
find that it is independent of L. For the more important eas(‘ in which p ~ A',7>, 
the semi-invariants k,, for C and F ure exactly the same for all L with a given .V. 
AVe .sec that 


K. = "E {1 ~2(t + kXOI - - 1) log II 

kf~*l 


2(1 d- s)|, 


wheTc Xi =: cos (2wk/p). lienee, 1 *• 

)i 1 

E + I i-'* always 0 or a multiple of p when p > i) therefore, the p’-- cancel 

and is the same for all p or for all L, since L — NJp. When p < i, the 
k,,’h will not be eciual fur all p. For example, km = 2(A'‘ — 1 ) for p — 2 and 
Kso — 2{N — 4) for p = 3. 

(b) Dixtribuilons of lRn when N/Ij — p. Tliese lesuUs indicate tluit the' 
distributions of the, .serial correlation coefficients for which the number of ob- 
hcrvations is divisible by the lag, so that N/L — p, would iuelude the distribu¬ 
tions of all the serial correlation coefficients regardless of the values of A' and 
W’u will designate any lag L as the primary lug for a given A' if A' 'L - p, an 
integer. For example, A and A have the .same density fnneiion, but we will 
derive only the density function for lug 2 , which we will call the priniury lag. 
The case of 71 = 1 is trivial, since it involves correlating a with il-elf, 

To date, wo have deriv'ed (ho c.xaet density functions for p ~ 2 and p -- 3 
and the rcciuirod integrals for p — -I. The .signilieanei* points have been tabu¬ 
lated in Table II. For simiilicitv of notation, we will set z./dv - /,A'„ and 
Fv = F. 

Case p = 2{N — 27). AV = — i 4 i + Rs and V - Ui -1 Wj , wliere 10 is 
distributed tus x" with L d.f. and ih as x" with 7 — 1 d.f. lienee. 


n,(ui , «i) = KCuif'- ''(wO"'- 

where 1/K = 2'''“*r(i7)r[)(7 — l))c' *. After .substituting u, ■-= F(1 — //A) 2 
and ii 2 = J'(l + iliz)/2 and integrating with respect to F from tl to o:, we have 


DirR^) 


(1 - *'(1 + JA )'"' ^ 

2'Ai{hL, hiR- 1)1 


If we .set (1 -• tRi) ~ 21/, then the eurmilative prohidiility fimetion is 


PiA > in - 


1 


pill K'l 


1)1 




Fear,son has tabulated the values of these incomplete Bela functions jr>|. lii 
his notation, P ~ i(7 ~ 1)1, where j = |(1 — R'). For iRi not cor¬ 

rected for the mean, P — hlhlo |A) (1). 

Case p = 30V = 37). /./OF = — 4«i d- u and F = ui + u, when* lu is 
distributed as x“ with 27 d.f. and u with 7 “ 1 d.f. Therefore, 7,.I Mr, «| -= 



10 


H, h. ANDKRBON 


KurV'^'-", where l/K = 2‘'®'^'”r(L)ri^(^> “ Dk*”'- Aft<T MjU,tituting 
ui = 27(1 — and « = 7(1 4- 2z,flj)/3 and integrating with n-«|«yt t« 

7 from 0 to », we find that 


DM = 


2'-(l - ^ff,)‘^‘(l 4 2 J?, 

hiD - 


> -i 


If we set X = 2(1 - fi')/3, P{,.Ri > R') » LU^, i{L ~ Dl- For A imt w>r- 
rected for the mean, P = I,[L, 

Case p = 4(JV' = 4L). t/(i7 =* —ui 4 U4 and 7 sa «s 4 Wi + when* uj 
is distributed as x* with L d,f., tti with L — 1 d.f. and » with 2L d.f. The cirnsily 
function of the w’a is Z)i,(nj, tri, u) == ’’u*" 'o ^ *, wIktp 1 ,'K & 

2 KU-I)r(^i)r[^(i - l)lr(L). Since U 4 - [7(1 4 (.ft*) ~ 1 * 1-2 and « 
[7(1 - Jii) - u]/2,0 < u < 7(1 - tftO for /.ft* > 0 ami 0 < m < I'a f ,.ft,) 
for iRi < 0. For tft* > 0, 


I [7(1 4 M - u)‘“-’'(7(l - M 

L '* U"0 


«*■“’ du. 


For Jli < 0, D{Jii) is the same except that the upjH'r limit for the inl<*Kral iff 
7(1 4 Jii)- If we make the substitution y « u/(upper limit) in each rtu*** imd 
then integrate with respect to 7 from 0 to «>, we have these density functioiw; 


DM = fc. 


(1 4 Jii) 


Klt-U 




\l-yf-~\l - M -i/(i4tftdj“" ’ tiu. 

for Jii ^ 0, 

(1 - rft*)‘'’"-”/\^‘(i -y)*'"-’"((i 4t«*) ~y(l »*,ft,)l»’" 

for (.ft* > 0, 

where fc = r[i(4L - l)]/2*'*"-” • r(L) • r(§L) ■ r[ML - 1)). 

The probability integrals must be evaluated for each L. The cumulative 
probability functions for L = 2 and 3 are; 

PM > ft') = 1 - ^ ^ 

^ id + R')*'’, forft‘^0, 

PMi > ft') = :^ P “ R's 0. 

^ 1(1 - ft')"’ - (-R72)‘'’(22ft'‘436ft'4126), for ft' g 0, 

fnf simpler for ft' > 0 when L ia odd and 

L > ^significance points for 

Siadfica^e noi^+r mtermediate points. It was noted that the 

compM^ns see Table III below. Note that for L > 7 the 6% nointe 
.dmt. 0.1 «d the 1% poi,fa „e»l, acourhtTto two dStTi 



aERIAt COBRELATION COBmCIENT 


n 


TABLE II 


Significance poinla of tJiy for p S and 3^ 



p««2 (N'"»2L) 

p®3 W-35) 

If 

PoBilive tail 

Negative tail 

Positive tail 

Negative tail 


6% 

1% 

G% 

1% 

6 % 

1% 

5% 

1% 

2 

0.806 

0.960 



0.488 

0.762 

-0.496 

-0.50 

3 

0.729 

0.907 

0.928 

0.994 

0.447 

0.077 

0.474 

0.496 

4 

0.t.a4 

0.852 

0.848 



0.610 

0.430 

0.480 

6 

0.612 

0,802 

0.773 

0.902 

0.373 

0..559 

0.406 

0.401 

6 

0.571 

0.759 

0,712 

0.856 

0.346 

0.51H 

0.377 

0.440 

7 

0.536 

0.721 

0.662 

0.812 

0.324 

0.485 

0.354 

0.420 

8 

0.507 

0.688 

0.620 

0,774 



0.334 

0.402 

9 

0.483 

0.659 

0.585 

0.739 


0.433 

0.316 

0.387 

10 

0.462 

0.634 

0,554 


0.278 

0.413 

0.301 

0.373 

12 

0.428 

0.690 

0.505 

0.656 


0.380 

0.276 

0.347 

14 

0.399 

0.554 

0.467 


0.239 

0.353 

0.2.56 

0.326 

16 

0.376 

0.523 

0.436 

0.577 

0.225 

0.332 

0.240 

0.308 

18 

0.357 

0.498 

0.410 

0.546 

nm m 

0.314 

0.227 

0,29.3 

20 

0.340 

0.476 

0.389 

0.520 


0.298 

0.215 

0.280 

26 

0.308 

0.432 

0.347 


0.182 

0.268 

0.193 

0.2.54 

30 

'0.282 

0.398 

0.317 

0.431 

0.167 

0,245 

0.176 

0,234 

40 

0.247 

0.348 

0.273 

0.374 

Etm 

0.212 

0.1.53 

0.20.5 

50 

0,222 

0.314 

-0.243 

-0.335 



-0.136 

~*o mi 


'\ 


TABLE IIP 


Significance points for p 




Poaitive tail 


Negative tail 

L 

N 

5% 

1% 

fi% 




Exact 

Table 1 

Exact 

Table 1 

Exact 

Table 1 

Exact f Table 1 

2 

8 

0.373 


0,618 

0.531 


-0.625 

-0.818 ;-0.7lH 

3 

12 

Em 

0.348 

0.5i7 

0.505 

0.628 

0.616 

0.692 1 0.656 

4 

mm 

0.326* 

0.322 

0.490* 

0.466 

0.451 

0.447 

0.60-1 i 0.580 

5 

20 

0,301 


0.451 

0.432 

0.402* 

0.409 

0.5-t3*^' 0.524 

6 

24 

0.281* 

IB 

0.419* 

0.404 


0.363 

0.497 1 0.482 

7 

28 

0.264 

0.264 

0.392 

0,380 

-0,338* 

-0.337 

-0,460* -0.448 


* L 18 the lag and p ■» N/L. 

* * indioatea interpolated valuea. 




























12 


K, I,. ,VNI>».a'''ON 


Case p > 4. We have not set up any Jif the- th-n-ify f»*r p > 4; 

however, it appears that the signifieanec Riven fur laK 1 wnul-i !** ar- 

curate enough for the higher lags. The exact .lignilteaiie.' - iur htg 2 have 
been derived for p = 5 and 7. The n'suler may note the ehe-' appKtxiin.jt jtm 
given by the .signifirance points for lag I when ;» - 7. W«‘ huju* e** rherh th,. 
lag 1 approximation for other lags in the near future 

TABI.E IV 


fSome sujnifu’iwrr {mints /er latj 2 




Pimitive tail 

NofCtifivr 

*ail 



5% 

! 1 % 

r 




p = 

5 (jV -= 101 



Exact. 


0.342 

! 0.540 

"•0.117 

it .V.l.l 

Approx. 

., , 

0.3(i0 

i 0..52.'> 

-n.ryfii 

tf.Ttt'i 



p = 

7 (A* = 14) 



Exact. 


0.335 

1 0.482 

-0.470 


Approx. 


0,335 

; 0.485 

“0.170 

0 015 

7 . Summary. 

1. The exact and! 

largo sample diHtributhuw havi* 1 

di’rivod 


for the serial correlation coefficient for lag 1 and tin* exact .Hignitieanee jKimti 
tabultaed for N, the number of observations, viji to 75; for X > IB, tin- large 
sample approximations can be used. 

2. It has been noted that the distributions for any lug L are the -sinu* ji.'. tluw* 
for lag 1 when L and W are prime to each other. In general the ilisfributiou of 
the serial correlation coefficient can be derived for any f, atid .V by using «»t)ly 
those distributions for which L is a factor of N. The distrihutuins and signifi¬ 
cance points have been derived for JV/L = p « 2,3 and4. For p > 4t,V > -I/.), 
the significance points given for lag 1 probably can be used when 1* is greater 
than 4 or 5. The accuracy of this approximation has been cheeketi for log 2. 

3. These significance points should be useful in determining the mctlwcis of 
studying a time series, as suggested by Wold, and in the fomuilation of a IwUcr 
test of the significance of regression coefficients when we know that the tilw«!rv&- 
tions are correlated in time. In addition, we now have a method of t«ting our 
assumptions of independence for any set of data. 


REFEREKCES 

[11 R, L. Anmhson, Serial Correlalionin the Analysis of Tima Serim, u.ipulilwheti ih«». 

Iowa, IMl. 

significance, Roy. Slat, Soc. Jour,, Vol, 98 (1936), pp. 636-543. 




HIAUL tOHRBr,ATION ('OKlFneiKN’T 


13 


m w. G c rtTHHAN, "Dintrihutinn tif (jundratir forms in a nnraml $y^lru\ with 

In thi* Riiftlypift nf rnvananrp,” Camh I'hil. Snr Prnr , Vn!. 30 pji jyJ” 
[4] L yi-ThniXiti, Mwkrn Kt*'mfnlfir^jTfu‘ori/pf Sutnh(rn,V nf ChiraKt* I’frwt. 

(51 Kari> PrAnffnji' (hdilnr), TahUn c/ ihr Inr/rtnplflc Hfla Fmrtim, f'amlmdKp V lv„« 

itm. 

[fij (J, Tintn> R, "Trftf« nf fli^niftranrp in tiiiir irrirn,'' dnmifi i-/ AfofA, Stut , VnS In himtu 
|i Hlff. ’ 

( 7 ! G, I'flrinfp thfft-rtnre Mfthtyd, I’nnnjiiB I'rm, HlfH’tmniROiu, 

5H. I‘M<K ' 

(S! II. Wni,tt, .-1 S!u4^ in Slw Armljiffia "/ .V/niio-nttr-/ Timr AVnrs, AhntjninJ an<j 
HuktryrJit'ri A. B , I'lijuiaJa, IW. 

(9) G. r. Vt'iift, "On thf litftp.rf'rrplstnm prnWrni,” 5?>'V ‘^‘Gf t*^ne..Jmr. Vo! *»! 

pji. 49^-537 ' ’ 



SERIAL CORRELATION AND QUADRATIC FORMS IN NORMAL 

VARIABLES’ 


By Tjau-incs KooPMAJJf* 

Penn. Mutual Life Imurnnee Cumiwinif 

1. Estimation problems of stochastical processes, hi trfircMtMn aiis«}yf*jri of 
economic time series a situation often ariws in whii'b n rerJain nit-ervi-ti i|>ian> 
tity represents a ''depotulent" variable at one time anti fin van* 

able at a later time. For instance, the followinR n'intioitH may exist ltftw«*n 
the price Xt and the supply yi of hogs at any time (: 

. Xt - a - liy, z'l 

If. " 

y, = 7 + &x,. I + 2 ,. 

The first of these equations pxprc.sscs tlw* pricT-deim's-ing infhienre of largo 
supplies. The second equation cxprtws the siipply-ttimuJatirig influi'nce of 
high prices one time unit (in the easie of hojpi, alKiut IH innwhsS i-arher, 'rhi> 
terms 2 , and Zi represent influenee.s of ocUliliotml vnriiihles and or random di'^- 
turbances. Elimination of i/< leads to 

(2) I, = < - fXi-l + 2i . 

The statistical e.stimation of the parameters t and f of Mirli an equation e* 
usually attempted by the ordinary least squares metliod, di’^regnrdiiig the faet 
that the observation Xi is both a dependent variable at tinu' f and an inde¬ 
pendent variable at time I -f 1. Tlie following «implf> example h}iouh that 
may lead to erroneous results particularly in small .saiufileo. SiipfWM' that» - f). 
{■ = -I, and that zt is a purely random variable with mean t), while only thtw 
successive observations are available. Tlie least squares i*stimate of ij ii* then 
given by tbc slope of the straight line connecting the jHiinla i.xt , Xfl ami </}, /*! 
in the plane of Xi_i and Xi. This aloj)e, however, ha« an expectwl value 0, 
because according to our assumptions the conditional expeelalion of Xs fnr fi 
prescribed value of ij is equal to xj, whatever value that is. Thus Uie lejwt 
squares estimate of f = -1 has an expected value 0 showing an im}M)rtant hm,s. 

Mathematical business cycle theories utilize systems of equations nmeh iimrr 
compUcated than the example considered [1]. The. common feature of tbw 
equation systems is, however, that they reduce fluctuations in a wt of wmmnie 
variables to 

1. earlier fluctuations in the same set of variables, 

2. changes in given non-economic or external variables, and 

3. random disturbances, 


> This invostigation was carried out at the Local and Stale Govornmcrii Swjiioi, (Priaw- 
ton Surveys) 0 the School for Public and International Affairs of Prineeten 


14 



BERUlv CORRELATION 


15 


An equation system of this type has been' said to define a atochasliral process 
m a number of variables [2]. The statistical testing of mathematical business 
cycle theories accordingly requires a theory of estimation of the parameters of 
stochastical processes. The operation of stochaatical processes is also apparent 
in meteorological data. Assuming a normal clistnbution for the random dis¬ 
turbances, it will 1)0 seen that the mathematical prerequisite for an estimation 
theory of stochastical processes is the study of joint distributions of certain cpiad- 
ratio forms in normal variables. 

In this article only the very simplest problem of this class will be treated, 
namely that of testing the significance of f in equation (2) if it is known that 
I f I < 1 and that c is equal to zero. This is the problem of testing the signifi¬ 
cance of single serial regreasion, or of single serial correlation, liecause the dis¬ 
tinction between single regression and correlation coefficiente disappears in this 
simple case for coefficients absolutely smaller than unity. 

In the next section the problem of estimating single aerial correlation if the 
mean is known will be stated and the difficuftiea involved will lie discu.a,sed. In 
section 3 a conditional distribution of a quadratic form in normal variables will 
be derived. The proof in section 3 covers only forms in five or more variables, 
but another proof covering any number of variables is given in section *1. This 
distribution is then applied to devise a test of significance of serial corrt'lation 
in section 5. The reading of section 4 is not nece.sHnry for the understanding of 
section 5.' Readers desiring to locate only the main results can read Iho.se from 
equations (3), (11), (16), (21), (36), (61), (62), (74), (79), (82), (92), and (90), 

2. The estimation of serial correlation. In the stochaatical process 

(3) Xi = pX(_i Zi, 

where the z, are independent drawings from a normal distribution wdth mean 0 
and standard deviation a, the parameter p may have any positive or negative 
values. The process will only be a stationary one if 

(4) ( P 1 < 1. 

For, since 

(6) Exi aa Bxm =“ Ezt =* 0, Ez] » ff*, 

and 

(6) Ex] =s + <r*, 

a variance of xi independent of t will be possible only if (4) is satisfied, in which 
case 



16 


TJAU-IW? 


If (4) is not satisik'd, hnwpvtT. Kx‘, wil! }«• iiti nvv'j me '!''n f4 " ’^ HdiriK 

infinity ill apprf».\in«if‘4y Ef'itim4ri«- 3: *«’'•■'*'i. ^’4', i;. lisr- 

article tho limitation i4i will l»‘ a pi3'<ri 

It follows from tH). (Ti. am! tin* a'-suoi*!!**?! t< a'i'd-oe 
diatrilmlion of the (puiiitilifs a-, . Z;. zi s* er**!. t-' 


( 8 ) 


/l “ pV i I ,? 

V J 



* i 


-r 


Sinoe the JarohUui of tin* tjan-foriiiati'*!! -'ti fj/tro tl> • , ;■ 

to tilt* varialilcs-xi, j’j • • • a*}-I'finali iiihH', tk* ponf fii'iiJio'*'*)) f-joi •},>* 

T sucrossivo ohsorvations .ri . .rj ■ i-, ’ha! msk*- op •> '•onjpk -i i i .‘i-J ' 

by replacing the St in by the emu‘'[etmiuig >%;':* ■ to 't,< , I',: 

leads to the distribution 


(9) 


P 1 , In .v*"i * 1 */:^ t s 
(2rr(r>)i'" ' 'ir.-r. 


in ivhich the llirei* (niailiatie form- 

1 a , 'I 

1 “ .I't y Xf , 


( 10 ) 


It! -- /iXs 4- XjXi -j ■ . ,1 . . 


n -= .r5 +• xj 4' • • • f xl 1 . 


arc the only charaeteristies of the sainiile timt t-nitr In oiler , -i* 

and n arc jointly siiffieient statistics for the I'stmnitn.n uf I* n, r- 

noted that these statistics n‘iniiiri th(*siun<*if the '•**nr-i *4 ob n s- 

in inverse C 3 rdev. 

It seems natural to attempt inavirmim fikelihiHal t-Hiiuntiion nj ;«?et • *<< n 

if the usual optimal jirotierties of cstinmtc.s so obtained bale -h in noi l«ui 
proved for atodiastiral proec.sse,s. Straightforwmd cah-njaUMn. i.-ad Ue- 
following third-degree equation for the nmsiniiirn likeliht«al esiiniate >4 ,1 

(11) (m - ;n)(l - p*) - |(/ - 2pm + (1 + p*ln| (t 

Of course the root asymptotically approaching m n has to be f-rdeeieij 'rb** 
corresponding maximum likeliliood estimate ^ of o- js given bv 

(12) ^°«^(f-2pm-b(] +pV1. 

In view of the complicated definition of p it seems dcsitabic a -uep 
to derive from (9) the joint probability distribution of /. m ami i* 'rin*. icquiw 
a transformation of the volume clement dxv dxr in (9) to (he form 

n) dl dm (In, 



SKKIAL (:ORBEL.\TION' 


17 


wliif'li it a.‘'Miinf‘H aftf*r iutcRrufinn ovi-r T ~ 3 rjthcr rorjitiiriafcs tlu‘ variation 
of which floc*^ not ciuuiKc /, m and n. 

Since tln^ is inircly u jiiohlcm of infepration comiilctoly (Icluicd liy tiic ex- 
I)rc‘"'ioiis (ifi), till* rc'-iiltin^^ fiiuctiim m, ri) is indciiciidcrit of p and a. The 
joint di>«trdMi(iiin 


(lU 


(I - ad* 
(27r<j'd'’' 


Jjf 2pr>* * s 1)1? »j,J ff? 


m, n) ill ilm iln 


of /, m and n will tiiti- Is* known fm any values of p and p as s<ion ;i>. j) js known 
for two [laiticular value- 

If a- itai'ticular value- we choo-(‘ p (1 and n ■ I, the x, heeoiiu' identical 
with the z, . and the imiltleni i- that of hndiiiK the joint disirihulion of f)ii' 
((uadratic foini- UtJi m irideiMuident normal variables with mean M and vaii- 
ance 1. Kven if -n ‘■ituiililiet!, the jiroblein is a eonijilicaled one. \\ !nle tlieic 
are inliniielv tnanv eomiuoa set-, of jinneiiial axis of the form- / ainl i/. none of 
fhi'se set- of a\i- lia- a -inule axis m comiuon with m. 

AHlioimh no "hition i- ofTcied for this problem, the followinii -nniic-tion may 
iie veittuied t U(i e ifi I, 1/1 II t |s kiitiwn, tin* inafhenialtcally smipli-t piocedme 
for iitteival cstiuiatiou of e inittld well Iw one that coniines atteniion losanipK’- 
havinji: the -.aoie ndues of I and u a- the sanijile actnallv obtained Suilabh 
chosen |m-)i i-ntih's of the londilional disfiibnlion of m with / atnl n hxeil at (he 
olv-erved value-, v.oii!il be convertible into confideme hunt- fm p wiili the 
help of - Ih 

A sitn(iler mathi neitir.d pioblem K encountered in le-litn: wheihei die exe-t. 
enci’ of a diffejenee i« !wein p and (I can is* established, oi, in otlei wmd’, m 
tesfinc; tile-(ttmticam «• of enal eoiiclation. Ifp - t), (he di-li ilmtion mneiion 
in d.tj di‘i«'nd' oulv on p I < »», not on I or n .M*paialelv . ate! i xai t 'Wimti* 
cunce limits fi,r f.-m be rlenveil from the joint di-tiibutioit 

*'-■ '’^j«i) tlji ihn 


tl51 


mSTiT ) 


of p and m only Thi- di tiibiifion will Ih' studied in the riex' lime ection- 
It is hoiM'il that the ne thmb tlieie ap(ilied will provide a u-eful taitmn point 
in the fre.'Stincnt of oihu juoblem-of the claxs flcscrilM*d in >'«e}ion I 


.1. Distribution of ft quadratic form in normal variables on the unit sphere, 
{’(insider two (p(ad!atii* forms m T inde|M‘ndettt normal variable i^ph mean 0 
and vtu'ianre 1, 


(Hi) 


P 


i . t - 
Xi J Xt t 


i . 


Kil, 


njXj 


b xrJTy 


While the clmnieteristie vahn^s of the form p are all coincident with tie- •.aiwe t, 
the charaeterisiie viduc- t., t if // are provisionally sijpjKjseti to Iw* tjsffm-n! front 
each other, «> tliat tU»*y can In* ar«inK(*tl in deen*a.Hing order 



18 


mi-LlNO KOOPMAN8 


Ki > icj > ■' • > Kr . 




(17) 

The probability density 

(18) 

in the space of the variables is constant on any sphere 

( 19 ) p m po so 'eonstant, 

while the distribution function jf(p) of p is that of the x’'^if*tribtition with T 
degrees of freedom 

(20) gip) 


¥^{m' 


The hyper-aurfacea on which the ratio 

( 21 ) 


r * ^ 


of g to p is constant are cones with the origin as vertex dissecting the same 
proportion of the metric "surface" of each sphere (19). It follows that the 
conditional distribution function of r for a prescrilntl value j»9 of p is in‘tr« 
pendent of that value po, and is therefore equal to the utirealrieleil distribution 
function h(r) of r. In other words, p and r are independently dwtrihuted. Their 
joint distribution being 

(22) ff(p)A(r) dp dr, 

the joint distribution of p and g =» rp is found to be 


(23) 


/(p, q) dp dq 




The function h{ ) may therefore also be described as the conditional distribution 
function of q on the unit sphere 


(24) 


p =» 1. 


Since xi and *r are the extreme values of q xmdor the condition (24), the function 
h{r) vanishes outside these limits. 

We shall now derive an expression for /i(r) by comparing (23) with an ex¬ 
pression for /(p, q) obtained through the inversion theorem of characteristic 
functions, The characteristic function F(i>, d) corresponding to the variablw 
p and q is 


(26) 


nn,8) = (2x)-*"/ 




dxi dXf » 9), 


where, according to (16), the polynomial D(»|, e) is given by 



BEKIAL CORHEI1A.TION 


19 


It follows from the inversion theorem that 

(27) /(p, q) == (27r)-’ // 0) di, d», 


the order of integration over r] and 8 being immaterial. 
Any elementary factor of 8) may be written 

(28) diii}, 8) = 1 — 2iri — 2i8Kt = (1 — 2ii))(l — 


2tflx ( \ 
1 — 2157 / 





Figure 1, Paths of integration in the x-plaue 


First con.sidering the integration over 8 (while v has .some fixed value), we may 
instead of 8 u.se 

1 - 2t7, 


(29) 


2i8 


as an integration variable. The path of integration c, in the K-plane then ia a 
straight line from 0 to —- 


00 and another straight line from 00 

2t 2t 


back to 0, 08 indicated in Figure 1, and the transformed integral (27) runs 


f(p 


,g) = (2,r)-’' - 2t,,)”*’‘+^ f 








20 


TJAt.UKfi KOUI*M\.NS 


The integrand ^ 

( 31 ) .-,111(1--) (-. 1 ^) 

for the integration ovei ^ lia,'. ...iiiguiaritiiT' niiiv in tin* jwniif >1 d ass-s ^ . 

i = 1, 2 7'. In Older lo MuipUfy the argument ue '.L.di Jhi* »i,< 

Cinadratie form q i*t positive dehnife, or, 111 couiieeiiKii wdii li , tin’ • (1 
The location of Hie Miignlarities is tlien ii- pirtinni m I'ninr. J « 

the integrand (31) is regular and of the ocdei of rongmtnde » t >■ t .ns >" 
ciiiently a eiirve integral of (31) along llie whole m* an> jiastofile mi»I> •, li 
will tend to 0 if R tends lo infinity. I’Miig a thr'nir-m oJ t son ir., ?' 1 i< 
pei’inissihle in (30) lo replaei' the de*.enlK’d path <\ lo aimthet pa’h >\ .. mI,;*!) 

.starts out alonge, from 0 np to ~ ^/f.fiom th(*ie loilnw** the . jn |. ^ /i' 


to the right over an angle x np to the iHtiuf 


1 


und from lle ie seinrti' 


to 0 along c, -provided that /( > *.1. .Vfter lever-ing the direetjnji ns who In 
the path IS followed in order lo do away with the lu-gaiive .-iign in .11 v tie* 
path .so obtained ran again be replaced by llie path i, * shown in I. 

which ('olneide.s with e, only nii to u .‘;inall di-tatiee d from tlie n .d -ivn. .md 
encircles all singularitie.s Ki while ridaining a d'Maiiee d (lom tie- jimi of the u .d 
axis to the left of and up to m . iMimlly, a path of itilegndioii ■»' ind'-pi itd< at 
of the value of q i.s obtained by going to the limit in wlihh d it ’I hr n an 
integration twice along the part of tlie real asi*' iMdween (} and h-, . intogiating 
from 0 to Ki that branch of the integraiul whieh is obtained by pa—mg ' undej" 
each singularity, and going back froniKi to 0 with (he linuieh oldametl by 
“around” m and "over” eaeh other singularity*. 'I'he integr.d so olilaitieil enn- 
vergo.s at eaeh singularity. This is also true for the singularity *, 0 iH-rauw' 

W'C are dealing only with positive vulnes of q, wliieli makes the ».\|Hnien(ial 
factor in (31) tend to 0 if k approache.s zero. We slnill now show tiuU if in 
(30) the path y' is .substituted for r, (with a change in sign), the order of integra¬ 
tion over K and q can he revemed if T ^ fi. 

The integral ove.r 17, taken from (30), 


(32) 


I = ( 2 x)-‘ r - 2 in) rlq, 

W<m«3 


(in which k is now a po.sitive real nuinber), is by the substiPKion /» q \ 
transformed to the integral encovmterecl in the derivation of tiie t' di-tubnlii-n 
(with T - 2 degrees of freedom) by the invemimi Ihwirem of eliaiHep-ristie 
functions. It may be quoted witliout proof (.see [3] p. -12) that it equaU 


(33) 


/ = I. i 


ir-s 


^^<vXW- i)' 


if f) qfh tl, or K > r, 

__ S t), or K S? r. 

* For even values of T the parts of y' for which « < kt can he cHaregardprl. Wcaimp tin 
parts the same branch of the integrand is integrated in opposite directions. 


ii = o, 



SEniAI^ COnaELATION 


21 


It is necessary to observe, however, that the integral 7 converges uniformly for 
all real values of k whenever T S 5, because then 

(34) f“| 1 - 


is convergent, Beeau.se of this property, the revcsrsal of the order of integra¬ 
tion is allowed for T ^ B. 

If now in (30) we first carry out the integration over rj and use. (33), we are 
left with 


(36) /fe ,) - (p - s)' 


where now is any curve proceeding from k = r into the lower half-plane, 
crossing the real axis at a point k > ki , and returning to k = r through the' upper 
half-plane, as indicated in Figure 2. (The path directly obtained is a path y'r 
consisting of twice the leal axi.s between r and ki , the braiiehe.s of the integrand 
being taken as indicated by y,) Comparing (35) with (23) and (20), u.sliig (21) 



Figoke 2. Tho integration path yr 


and the well-known formula r(x) = (x - l)r(x - 1), we find th(« following 
expression fur the distribution function of r: 


(36) 


Hr) = 


2Tri 


Kr II (k — K ,)* 


(Ik. 


This function vanishe.s for r S kj . In order to arrive at a positive distribution 
function for kt < i < m that branch of the integrand must be selccU'd which 
IS positive for ri'al valiu's of k exceeding m . 

It is worth noting that the tlcgree in k of the numerator of the integrand is 
two less than that of the denominator. Owing to this fact, indeed, the distrilm- 
tion funetion h(r) salislies the two obviou.s conditions: 


(37) Hr) =0 for r £ kt, [ Hr) dr ~ 1. 

For r g kt the path of integration in (36) can be reiilaeed liy any closed contour 
enclo.sing all the singular!tic.s r, kt , •' • , k, (r is a singularity only if T i.s odd). 
Taking as such a contour the circle j k 1 = I£ with R tending to infinity, we find 
that h(r) = 0 because the integrand is of an order k' ^ at « = w. Further, if 
•Yr is again replaced by y'r which runs entirely along the real axis. 



22 


tJAU.lSti KrK»P'tf<.%> 



bpcavisc Uki intcKraml in tin* las'l itiU-(?rn! i-* nl ih*- **r<l«*r ni * ^ ri* <!)f' p?tHt 

(C = “. 

The quantities r anti ki enter into the riRhl hand tnetnlMT uf .'W*'* m» 3 v ;n *?;»• 
form of differences from the inU'Kralion variatile 'i’Jie miditi'Ui nf s «ujr.’:.'!n{ 
e to both r and the ki will therefore merely reMill in a eliatme of heMiptn «*f ihf< 
distribution on the r-axis vvilliouf a eliantte m form; 


( 39 ) 


h*ir t) » fttr). 


This could be expected since svieh a triui-fornmlion nivan** th'- »*f tp 

to the (luadratic. forin q atudietl. It follows (hat <lie valnhtv «>f 30 ) i- ti«<t 
limited to positive detinitc tiuadratie forms q, smei* wny ojIkt <|un*ki»»n' form 
can be transformed to a imsitive definite form by thi*. M|sTntion if ;«-iurtii-n'iitlv 
large value of «is taken. 

The function /i(r) is a differemt analytic funelion U-twet-n nny *«<• djfTrrrnt 
successive characteristie values ni mil k, (. The expres-imi '.'hti hold-i f»*f ru-n 
and for odd values of T, and is also valid for any rmtnb’i of rmni'idefieos iti 
the set of characteristic values «,. It is true that inteandion jihnifi the jiaths 
7' or 7r entirely coincident with the real axis, such a> has Iwen m!jod»)«« d in 
intermediate stages of the above proof, cannot l«* done if »w«i 01 nior** oif iln‘ 
til coincide, because of divergence of the integral. ()iicc t.’Kh ha- Isi n du d 

for distinct characteristic values, however, it follows from (siiiHderatimi'. of rnji- 
tinuity that this result holds good also if coineitlenm occur in the h*J 
The function h(r) has been studied by von Neumann HI by an eniirek' dif¬ 
ferent and very ingcncous method for the special rase that T i- even wlni- no 
two characteristic values are equal, and for the ease that the eluirarJen-tie 
values are equal two by two but otherrvise different. The pn»ix<rtie>4 e-taSilidierl 
by von Neumann, and some generali«atkmR of UveHC proiSTties. can !*r drnveti 
from ( 36 ), If T is even, the derivative of A(r) of order | 7 * ■ 1 ns 


(39) 





(j T - 1 )1 

" ~flir-ic.7 

•-.1 


if )fn.i < r < *-, and l tald, 


does not exist for r « /q, f » 1, 2 > > • T, 
^ = 0 for all other values of r. 



SERIAL CORRELATION 


23 


If all characteristic values are distinct, all derivatives of an order lower than 
IT — 1 exist and are continuous everywhere. Generally, whether T be even 

or odd, at a point where k characteristic values coincide exist 

and will be continuous if j ^ §(T — k) ~ ^ and will not exist if J S 
HT -k) ~ 1. 

If the characteristic values are pairwise equal, 


(40) 


X2i—1 K2m ^1 , S *“ I, 2, * * ■ #5, 


but otherwise distinct, their total number T = 25' must be ev'cn, and the only 
singularities of the integrand in (36) are poles at the points *. = X,. Accord¬ 
ingly the path of integration can be considered as a closed curve, and the 
integral in (36) can be replaced by the sum of the residuals of the integrand at 
all poles inside the curve: 

(41) h(r) = (S - 1) X: if <r<K. 

Hero -P'(X) is the derivative of 

(42) F(\) = n (X - X.), 

n* 


its value in the point X = X, being 



n 


U -<1 


(u\«) 


(X. - 


Xu). 


For 5 = 2 this is simply the rectangular dustribution 


(44) = X, <r<Xi. 

The numerical calculation of the distribution (36) with distinct characteristic 
values is extremely cumbersome except for very small values of T. If the 
characteristic valuc.s nt follow some definite pattern, however, it may be possible 
in some instances to work out a reasonable approximation formula, ''i'wo ex¬ 
amples of this type will be shown in section 5. 


4. Another proof that covers also cases with T < 5. The innuf of (30) 
given above holds only for T ^ 5. Once the form of (3(5) i.s known or in'c.Humed, 
however, anotlier proof of its validity is available, which luus mathematical 
interest in itself, and covers all cases from T ~ 2 upwards. This is a proof by 
complete induction, based on the proposition that, if (36) holds for T variables, 
then it also holds for T -f 1 variables. This proposition again rests on the 
recurrent relation 



24 


TJALUNG KOOI’MA.N-R 


(45) 




if sru 


mvi 




proved elsewhere in this issue hy von XeumanM* 15]. U lh»> di'trtltu- 

tion function hrir) for T variables with tlie fiinetifin Mbfuiuit-d Itv the 

addition of one variable Jrei and tme eharaeteristir value *t.i 
We shall substitute the "presumtHl*’ expression o'Ui) for wph '/' l\ ui 
(45) in order to show that tVie rmilt h)r Arn(r) is th»* sain>' wjth T 

increased by one. In this proof if has for si^nplieify*^ ‘•ake len-jj that 

the new characteristic value xr+» is smaHer than any of {ho>«" Hkead>' presj'iit, 
and that no two of the xt are equal. If is then p<»H,)»ible sKaiij to se!i«q m .10) 
the path of integration y, which pniewls along the nt^al axs- from r to *5 and 
returns along the real axis bo r, passing each .singularity in tht* ‘.anv wav a;* >, 
doe.s. If the integral (3G) i.s substituted in (4.5i in 110*^ form, the toderof integra¬ 
tion over K and r can lie reversed, the n^ult Iwing 


(46) 


'L[{s‘' 








Writing for greater clarity kt+i = a, r' « h, a « e, r "i: r. we hole to evahmte 
the integral 


(47) 


Iria, b, c) = f (z — -• b) *(e ~ 


0 < ti < r, r st 1 , 

with the positive square roots taken if z is real and b < z < r .'4up}st«i'e tiixt 
that r = 2)S + 1 Or odd. Then the integrand 

(^8) <#>25+1(2) = (2 - o)“'*(2 ~ b)' *(c 2)® * 


singularities at a, b and c, of which only those at b and c are nf » tyjK* such 
that (#>23+1(2) changes its sign if the argument 2 is turned once around the singular¬ 
ity. It follows that 


(49) 


2 Ij 3 +i * ^ <^+ 1(2) dz, 


the path of integration « being as indicated in Figutti 3 , For if the curve I i« 
contracted so m to run entirely along the real axis, from b to r mid bark to h, 
BtonZi’?? each yield a contribution equal to lu.x„ the uitder- 

Thl?nf . <*' 

.ffti.its!:’™s “'■"“■* 


greatly indebted to Professor von Neumann for 
me before its publiostion,. 


oonununlcaUng this relation to 



SERIAL CORRELATION 


25 


(50) 


— 2/28+1 



dz, 


where e, as in Figure 3, encloses the only singularity not (‘iiclosed by 6 . 
neighborhood of a = a the following expansion of (t>is i i(z) holds: 


(51) 


</> 28 +l(z) 



{z - a) -^-^»r/aY 

s! LVW 


(z - hr\c - 



In a 


The only term contributing to (50) is that with —S 4- s = —1- Since we 
selected a branch of <^ 3 + 1 ( 3 ) such that (a — h)~\c — z)*”* falls on the positive 
pure imaginary axis for real values of z below b, this term can be written 

where positive square roots should now be taken. The contribution of thi.s 
term in (50) is 2 x 1 times the coefficient of (z — a)“‘, and therefore 





(53) 




■r(^)r(l) '■ 

Since r(^) = x'j it follows that for odd values of T 


(5.1) J, = (J, _ 

It is easily seen that the same relation holds good if (T ~ 26' is even. In that 
case it follows from (47) that 

(56) (0"'^” - - «■'* 

= (6 - 2)1 (c - o)-®+‘(c - br\ 


In a manner similar to the transformations in (53) it can likewise be proved 
that the right hand member in (54) has the same derivative of order 6' 1 



26 


TJAU,lNt} 


■with respect to c. It follo'vvH tliat the twt> mi'itiitf'rH of ,51' nfjfhr h%- a 
nomial Q{c) in c of a degree nt most cqnaf In H 2. the riw-ffir s« ns>* nf '•,«.hirh 
may depend on a and h. However, Itoth inernlKT^ of <5} > lu- «»3! ns.^ !L* >r i,pA. 
S — 2 derivatives with reajM'ct t<» r vanish if r ■- h Th» r* fnr» Q i,’«jn4sf'*. 
identically, and (54) holds for any integral values of T nni *<fi3nlt»T ;h 

Finally, if (64) is inserted in (46) an expiwsrinn ftir Ar*j'r’i n. whirh 

corresponds to (30) with T rerdaeed hy IT + I. 

It remains to prove (36) for some initial value of 7'. For 7‘ 2 thr integral 

in (36) is divergent, but the form of h{r) i« easily found dirrrtly Wnling 


(56) 

we find that 

(67) 


p => a:? 4- I*, 


I* m 


+ j| 


d{xi , Jt) 

a(p, r) 


fl^P'AV 
La ( 21 , Id J 


2r 


‘1 


2x, 


-2a-,, 

' 1*1 
' P 


r) 


4iiXi(k, ~ Kt) 


4(«t 


P 

1 

rjH'' 


r^ 




The probability density in the ari-JCr-plane is, of eourta', Viirl V l»ui iji 
making the transformation (57) a factor 4 must tw applhsi to aerouni for the 
fact that to given values of p and r corrwiKind 4 wU of value-" r»f x, and /,, 
differing in the signs only. This leads to the joint UiKtrihuliori of p am! r 

( 68 ) 


J. g-U_ dp dr 

2t (*i — rf»(r ~ K,)t’ 


and, after integration over p, to 


(69) 


h{r) 


if «j < r < a, 


ir(Ki - t' 

= 0, if r < K» or k, < r, 
in accordance with (39). 

Finally, if (69) is inserted in (46) with T ® 2, the nsfult i« 


(60) Aa(r') 


j_ r 

2ir JUt.r'l 


(ki 


(r - r0"‘ 

- »t*)* 


dr, ** < r' < 


if [«s, r ] denotes the largest of k» and r'. Writing k tor r, we find that this 

established for selecting the branch of the integrand in (36). For. takina- thn 

the two^parte contribiitiorw from 

th« ^ ™ between « and a, reinforce each other, white for r* < g* 

the remaining contributions intervals between r' and r,l add on m 
completes the second proof of (36) ana r,) add up to wro. Th» 



SEIIIAL CORRELATION 


27 


5. Application to serial correlation. We shall now derive the characteristic 
values Ki in the case that 

(61) q = m = XiXi + XiZj + • • • + Xr-iXr ■ 

It will be of interest to compare this case with the slightly modified case of 
the quadratic form 

(62) fh = XiX2 + X2Xi + • • • + Xt-iXt + ZtXi , 

which contains an additional term XrXi accomplishing a circular arrangement 
of the variables This modification was originally suggested by Hotelling in 
order to simplify the characteristic polynomial. Other .simplifications arising 
out of the circular arrangement will appear below. It is possible, of course, 
that the power of the test of significance of serial correlation is slightly affected 
by the substitution of in for ni, but this presumption needs corroboration by 
a study of power functions. 

The characteristic values of m are those values of k for which the determinant 
of order T 


(63) 


— K i 0 


0 

0 

0 


= 0 . 


0 0 0 


By development according to elements of the first row we find that 
(61) Ar = —xAr-i ~ iAr-a , 

from which it follows that 

(65) Ar = ci^i •+• Cjfi , 
if $1 and arc the roots of 

(66) I’ + Kf + i = 0, 
satisfying 

(67) + $5 = —K, hii “ Jr- 

By inserting the known values of Aj and A 2 in (05), the values of Ci and cj are 
easily found to be such that 

ur+i J,r +1 

(68) Ar = f - ■ . 

tl — fj 

Although as a polynomial in k this is a rather complicated expression, the im¬ 
plicit form (68) will suffice for finding the roots of (63). Expressing all other 
variables in terms of one new variable w, 




28 


TJAIiW.S'O KCKJI'MANM 


I 


(69) h - ~ 2' f* — + 

we find for (68), 

r+i ■ T 1 / j \r 

^ ‘ ^ ^ W - w“« \ ' 2«/ '■ I 

The only values of w for which this oxpncsMiott vanisliM sr*’ the nf 

(71) w*'*'^'* 1. 

excepting those that arc also rcKtts of 

(72) «**=!. 

This leaves us with 

(73) « - 
The corresponding characteristic values are 

(74) 

because the same value of k, is obtained whether the (Hwitivc <ir the iieKafive 
sign is taken in (73). These arc T different values a,, and hencd- «.arh tinf> js « 
single root of (63). 

The characteristic values of m can now be derivini fnmi (IWI. allhtniRh n 
simple straightforward method baaed on the pror^rtitit of rin'ulanl« i* aKi 
available (see [6], p. 13), Writing 


I 


t 

» « ♦ 


f 1.2 


r. 


T, 


(76) 


At — 


— K ^ 0 

0 i -s 


0 

0 


« dr + 2(~1)’' •(§) 


I 0 0 ••• - 

we find easily from (70) that 

At = C-iV I y - - 2 - " “ 1 \ 

(76) \ 0) — w“‘ u — w-» / 

= (~4)^(<>i’’ + u"’’ — 2) a* — Ta 


if 

(77) 


1 ). 




A complete set of the values w, for which Ar vanishea is found from 


(78) 


a< 


2irf 

T ’ 


1,2 


and the correspond ing characteristiij values* k, are, according to (69), 
todecreSsheT^blenibaXMriLagi^'^^^ chwaoteristic valuM aewrding 




SERIAL CORRELATION- 


29 


(79) Ki = cos at = cos — , ^ = 1, 2 • • • r. 

In contradistinction to the case without circular arrangement, the characteristic 
values with indices t and T ~ i now coincide, such that all characteristic valuea 
are double except one (Sr = 1) if T is odd, and excejit two (Sr = 1, Sir *= —1) 
if T is even. 

Taking advantage of the duplicity of almost all characteristic values, Ander¬ 
son [C] has derived expressions equivalent to (36) for thus case, using methods 
that depend on this particular condition. On the basis of these results he has 

iflr * 

computed 99- and 95-percentiles in the distribution of f = — for the values 

P 

7’ = 2, 3, 1, 5, 6, 7, 9, 11, 13, 15, 25, 45, interpolating the percentiles for inter¬ 
mediate values of T. The 95-pcrcentile for IT = 45 is 0 240, as compared with 
0.261 for the normal distribution that provides an asymptotic approxima¬ 
tion. 

Wheri'as cm this showing the normal approximation is slow in becoming ac¬ 
curate with increasing T, a method for obtaining a much closer approximation 
is available, which works out simplo.st with respect to f, but can also be applied 
to r. The iirmciple. of this method is applicable whenever the characteristic 
values follow a definite, mathematical pathwn. 

The. method consists in replacing the. finite number of discrete values ki in 
(36) by a continuous variable, X, distributed according to a density function 
suggest'd by, and as closely as possible approximating to, the scatter of the 
values Kt . According to (79) the values Hi are ordinates of the cosine function 
at equidistant ixiinte spaced out so as to cover one complete period 2x of that 
function. It is natural to approximate this scatter by the density function 


( 80 ) 


x(k) 


2x7- 


t(1 - X»)‘ ’ 


of the cosine X == cos of an expression in which the variable f- has a rectangular 
distribution between 0 and T. The numerical factor in (80) is such that 


(81) 


x(x) dx 


equals the total nurnlwr of characteristic values to be replaced by a density 
function. The idea underlying the sulwtitution of x(X) for the is to obtain 
what intuitively seems to b<< in some sense, the closest approximation to the 
exact distribution function Mf) that has continuous derivatives of any order 
in any point except the. two points (f =a ~1 and f = 4-1) that limit its 
range. 

The factor in the integrand in (36) which involves tlie Si is approximated as 
follows; 


( 82 ) 


r 

II(>c 

i-l 




12 


exp 


r_ 

2r 


i: 


- X*)» 


d\ 



30 


TJALUSCI KOOPMANK 


In order to evaluate the integral 


(83) 


J 2I3t 


i: 


log (k - X) 
(1 “ 


d\ 


we shall first prove that ita real part ia independent nf *, nr that 


(84) 




if 9i denotes “the real part of". The integrand in iH41 h«^ f^ingtslnnties at ihr* 
points X = — 1, 0, *> 1. Thc!8e arc of two tyjw. The wngnlnritic^^, X -j i-\ 
are introduced by the denominator and make the integrand rltaisge i»#i stun if 
the argument X is turned once around either wngtdarity. If "tarittig fpun » 
point on the real axis we turn the argument X nnee nrotiiid either of tin* other 
singularities, X = 0 and X « /c, introduced hy the numerat<ir, th"n the real 
part of the integrand is not affected, while 2iri or — 2ri is add«t to the imaginary 
part of the numerator, depending on the sense trlorkwim' or anti-rhv'kwi'tr! of 
the rotation and on the sign of the logarillim in (84) reRpemsiMe f.ir the '•itigulsr- 



Fianaa 4, The integratioa path 0 

ity. It follows that one revolution along a eloeed curve 0 rontaining all four 
of the singularities, as indicated in Figure 4, corriw u« back to the same bnmrh 
of the integrand, after mutually offsetting addlllnna to the imaginary ftart of 
the numerator and after two changes in sign. Tliis is in Bcconionee with the 
regular character of the integrand at tjie point X SOS lOO ^ 

It follows furthermore that the left hand member of (84) ean be rejilaml by 

-X) 


( 86 ) 


i9i f TJpg t- 


dx. 


For, If the curve p is constricted to a path 0' running along Urn real«» from -1 
to +1 and back to -1, the contributions of the two halvw of Uw jialh will Ito 

Tthl ^ ^ f«*- the pwbi 

of the path 0 between 0 and k, because the beha\dor of the r«al part 

log 1 K - X 1 - log 1 X 1 

not been eliminated. ^ ^ w^herft the uaaginary pari haa 



SERIAL CORRELATION 


31 


Finally, if |3 in (85) is replaced by a large circle | X 1 = iS, the validity of (84) 
follows from the fact that (85) tends to zero if R tends to infinity because the 
integrand is of the order of magnitude of 
The real part of the integral in (83) accordingly is 


(8G) 




__ f' loglM ^ t 
- ( 1 - 


(1 - X“)* 




or, by the tianaformation X = ain a:, 

m = 2 sinxdx = 2 L cos a: cte = j log (sin x cos x) 


dx 


(87) 


f log (5 sin 2a;) da: = ^ log ^ + ^5RJ, 

JQ Z 


so that 


(88) 5)?j = log 2. 

In order to evaluate the imaginaiy part of (83), it is necessary to specify 
on which side the singularity k is passed by the integration variable X. In fact, 
both coses need to be. considered; the passage of X "over” k for values of k on 
the first part of the path of integration y, of k in (3G), where k goes along the 
real axis from r to 1; and the passage of X "under” /c,for values of k on the second 
part of its path y,, from 1 bock to r. If the upper sign in the following formulae 
relates to the fust of these two cases, we have 

(89) T irr J = T arc cos k, 

and, from (88) and (89), we find for the last member in (82) 

g~!r//r _ 2ir gilrmto co«« 

Writing 

(91) arc cos X = a, x = cos «, = 2i sin ^Ta, 

wc find the following approximation for ^(f) by inserting (90) in (30) as indicated 
in (82): 

(92) Ji(f) 1)2 r (cos a ~ sin ^Ta sin a da 

IT Jo 

Calculations of the distribution function and of its percentiles will Ixs much 
simpler for this approximation than for the exact function, 

In the case of r = m/p in which no circular arrangement is made a slight 
complication arises. The characteristic values xi given in (74) are again ordi¬ 
nates of the cosine function at equidistant points, but they do not cover a com¬ 
plete period or half-period of this function. Probably the most accurate pro- 



32 


TJAM-INO KfKJfMAN'rf 


cedure would be to replaeo thoIimiU^of integration m HTij hy j ** i r j 
and cos [(J* + |■)^^/('?’ + U], iw to have earh dif-rretr stitrgf;ii| -.abi*’ of f la 
(7'±) contribute an interval U - f + of nut* l•'ng^'ll ^hr tmtK" *if 
the rectangularly diKtrihu(«l variable r nrm ck'jjmdR tin* dr.frd.syiHU of 
X = cos [irT/(!r -b l)lj while malting Mich an adjintrncnf nt tlu jinrr.enoai 
factor in (80) that the equivalent of sKl) with tin* new <4 snf# gristnuj i«t 

satisfied. However, the evaluation ttf t'H.11 nn«l tbf' f*!mphr)«\ «d di*' rK't..i|i t 
tially rest on the fact that the limits of inU*gr«ti(nn rsrunetde wjth ■•jngsdaritK'* 
of the integrand. In thcRP eirrumataneeft a rather ‘'irnjde n '.?3h «nn again Iw 
obtained by introducing two further elnuige-H which very tiearh r.msf-H-nHate 
each other. The first change ia the arhilraiy ett*'n«iitn of the Jims'!-- it»f jists'Kra- 
tion to what they are in (83), while increasing the imwTv-Rl fartnr su Hft) in 
such a manner that the integral in (81) ivill Ik* 7' •( I instead u( T ‘I'hr Josvtw 
the described contributions of the discrete values of / in '7lf tf< fie* rangi" nf ? 
unaffected, but adds to that range the tw'o intciwab 'h. |'i an<l ‘T * 7'ii nf 
half a unit length not represtmting ruiyfliing that wm ufti'sHy pr<"«-ns This 
can be largely offset by intrmlucing two additinind tht-rn'to \3ihn;. f I) and 
t = T + 1, each with the negative weiglit - i, if the weight of «|1 otlo-r dHerr-i#* 
values is considered to be +1. Instead of (82) we (hen hnve 
r 

-12 •«*(<-»() 
e I-'- 

.-s. exp + I log f» I) ♦' i h»g f* 't 15 

If this expression is inserted in (36) with y, eonKfrictcrl to y', (hr- argnrmuit of 
(94) e“’® (k - 1 )"‘ 

is -7rt'/4 when x goes from r to 1, and rt/4 when * nduma frorr* 1 hi r, r)n 
account of 

(1 — =» sin* cr, 

g-i(T+i)fa+rfrt ^ 2t gin jj(y ^ 


(95) 


i(r+l)te-r«« 


the result now is 


h(r) 


(ir - 1)2*^-^^ 


(96) 


J AKro 009 r 

I (oosa - r)*’'"*8mlKT-f 1)« 


r/dl sin*'® (flt <hi, 

integration that the eonditinns equivalent 
frn^ i f approximate expressiom (92) and (M). Thw follows 

aTtt de?r.iSf«( ^ ^ t>‘t‘ numerator 

enommator.in (36) is preserved by the eubstitutiona (82) and (93); 



SEIUAL COHriKLATlON 


that the numerical value of the limit for k —> ^ of k times the integ:rati(l in liifi) 
i,s not changed; and that no singularities outside the .segment —1 S ^ 1 of 

the real um.s are introduced. 

Therc‘ i.s, of course, a certain degree of clLstortion involved in reiihieing the 
exact distribution function.s by the .smooth approximation.s derived. Such dis¬ 
tortion is most seriou.s in so far as it occurs at the tails of tlu' disfrihulion, where 
th(“ usual signineanee limits are located. For instance, the exaet di.slrilmtion of 
r is a.s\mmeliie if T is odd, and ranges from eus [(T — l)7r/2'] to d-I, wherein 
the smooth appioximation is .symmetric and ranges from —1 to -1-1. In (he 
ea.se of i both the exact di.stribution and the approximation are .symmetric, but 
the former ranges from co.s [T-kI{T -f- 1)] to cos [7r/(jf’ + 1)1, tht‘ latter frotn 
— 1 to -ft. How(!ver, this difference is to .some extent roinpen.saled by a eiirious 
anomaly m the funetioii (90) This function actually dips la-low zero on .sym- 
metrieally placed .small inteivals adjoining —1 and -f 1, the Imigth of rtbicfi is 
of the ord(-i of the difference 1 — eos [r/{T -b 1)1 betwemi unity and the highe.sf 
eharacti-ristu- ■value. Percentiles must thc-refore be counted on both sides from 
two points ab.solutely smaller than unity, defined liy retiuiring that the small 
parts of tin- an-a "under” (he ciiive (9.*)) outside these points are algebraically 
zero each. 

These distortions have imi>or(aiiee only for small values of T. Anderson 
finds ([(i| p. ,‘j2) that the i-xaet funetlou J\[f) is .symmetneal within three-deeimal 
accuracy for all values of T 11 (the modal value ?i(0) for T -- 11 i.s about 1.27j. 
There an* in (la* ease of f tliri'e eharaeteristic values J, (-xeeediiig tin- Do-percen¬ 
tile a.s given hy Anilerson for 7' = 7; 5 for T — IH; 11 for T = 25. C’orres|ioml- 
ing numhers for (he 99-pereentile are!! for T = 12; 9 for T 2.'5; IT for?' ■-■i5 
These nurnliers .sugge.st that the apiiroximations (92) and ('.)()) will provide good 
.sigtiifieanee iimits long before the normal appi-oximation is aeceptahle, Aeeurate 
calculations will Ik* needed to find out from what value of T onward the ap¬ 
proximations can .safely be sulwtituted for the exaet distributions. 

REFERENCES 

[1] J. Ti.NnBnr.EN, /Imsihcss cycles in Ihe I’nilcd Stales of America IfUO-lOSH, laiiigim of 
Nations, (Iciu-va, 1030. 

M, M. Fl.ooi), "Uccursivn rncIhodH in liUHincBH-rycle umilyKiH," ICeonumelricti, Vttl, H 

tioto) p. rm. 

[21 II. \V(u,ii, A study m the, Analysts of Slatumary Time Series, IIppHala, 193K. 

(31 .S, H, Wii.Ks, Stnlislieal Inference, Ami Arlior, 1037 

[•H J. VON Nkcmann, "I)iHt,ril)Ution of llio ratio of the inonn Btiimre HiicecHHivc tliffercncc 
to the vurianct)," Annals of Moth. Stal, Vol, 1‘2(UH1), [ip. 307 -30.'i. 

(5) J. vo.N' Nki'ma.vn, further mnnrk concerning the (liHtrihution of the ratio of the 
mean aciuarc HuceeKnivc (lifferoncc to tlie vurianee," Annals of Math. Slat., Vol. 
13(10-12), pp. 8G-8H. 

[61 II, L. AsimnsoN, Serial Correlation in the Anedysis of Time Senes, unpublished thcKis, 
lowa.Htate College, 1911. 



A GENERALIZED ANALYSIS OF VARIANCE 
By Fkankun E, SAn'BRTuwMTf; 

University of I&wa and Arina Life frtmretner (\mpanii 

The analysis of variance is a statisliral lerhnititu* vvhruw Jir'ld'- t<f a|»i>lir{jtiiiri 
are only beginning to be explored. A few .simple stmulimi dei+iRtri «piN-ar in f be 
literature and a great deal has been done with them. ILmrvpf. >f the applied 
statistician limits himself to such .standard designs, he sisin find?- «li£»t tntinv of 
his problems are receiving inadequate or inappropriate tnmlmt'td, 'f'iie writer 
has found this particularly true in his own firdil where rnosf nf the fftw data hw 
in the nature of frequencies or averages which lack homugcnri!>’ of varituu’e. 
Also the nature of the problem usually indicales tlie use of tteigh»<i! averages 
rather than simple averages and sometimes part of the data are riiisr.itig, 

The purpose of thi.s .study i.s to examine the fundanietital prinriph * under¬ 
lying analy.sis of variance designs and to .^how how dcdgris niav Is- eon-.lnielpd 
and applied to practically any data wlueh can lie assumed in Is- normullv 
distributed. 


1. Test of independence. In the analysis of variance \u’ cnlculitle two nr 
more statistics of the types, 

x’ =» X(ar, - ffi,)*, 

x’ « . 

The xt’B are considered to be independent variables from a normal iKipiilulion. 
The m,'s and the d,’s are homogeneous linear functions of iho r,V. Ileretoftire 
the demonstration of the independence of the x*'s tiscd has only fa’s*?! rnarlc for 
certain special fl.'s and mi’s. To moke our analysis general we shall let mtr ^,‘it 
be general homogeneous linear functions of the x.’s and we shall define mir m.'s 
through certain linear homogeneous restrictions. 

Let us define Chi-square as 


x'* « 2{ii - mi) 


where the Xi^s are independent normally distributed variables wltli mean w-m 
and unit vanance. We also define certain linear functiona of the Xt's,' 


Oj « UiiXi , 

which we shall assume to have been orthogonaliiied.’ 
make use of the linear restrictions 


i I, 2, ■ • • «, 
To define the m,'» we 


DbUb * ^ ^ I ton unless otherwise epeeified. The KrouMkot 

Delta s , equals one or sero depending on whether i equals or n”equ “i 

The 9,s are orthogonal if o,io,» - s„. Any algebraically independem set may b« 

34 



ANALYSIS OF VARIANCE 


35 


(2) a/,(xi — m.) = 0, i • 1, 2, ■ • • 

or 


a,,m, = a,ix, = 6,. 

Thia Ryatom has an (n — H)-infinitude of aoliitionR and we ahould not expect 
all of these to be suitable for our purposes. For reasons which will appear later 
we shall choose the, single solution, 

(3-) THk = a,k6i = a,kfi„Xi , j >« 1, • > ■ 

This is the solution which follows if we complete the system (2) with n. ~ a 
additional linear restrictions on the m.’s which are homogeneous and which form 
an orthogonal set with (2). Thus 

a,{m, = Bj, i » 1, • • ■ s, 

a„m, = 0, j ~ s + 1, ^ ■ n. 

This is consistent with standard analysis of .variance designs. For the usual 
one way analysis, we have 

(d) Z «;.• = S ppj a:,. j « 1, ... s, 

which yield a solution according to (3), 

The additional homogeneous restrictions in this case might have been taken as 

m,i = m,t = • ■ • = ?n,r, jf = 1, • ■ • 8, 

which are orthogonal to (4) and may be easily orthogonalized among them8elve.8. 
Substituting the values of the in,‘a obtained in (3) into Chi-square, we obtain, 

x’ = (x, - m,){xi - m.) 

^ (fiijt “* n,,ci,k)xk(Bii “ (i,„iUnt/}Xi, j, m 1, ' 8, 

aa (5*1 “• amk^ml ~ + B/„a,ka„i)xj^i 

« ( 5*1 “ a,jtaii)XiXi. 


replaced by an equivalent orthogonal Bot. Thus, it 8» ia not orthogonal to 9j, it may be 
replaced by flj “ 9i 4- kSi, where k ia determined by 

OuCoj; + koij) “ 0 

or * «- ~oi,ai//oi,aif. 

The condition 2oJ/ — Xa]/ « I can always be mot by simple division. 



36 


yiUNKtIN K. ftATTKimiWAU £ 


The aum of the Rquarrs of the 0 ,'h )h 

20® ^ 6,8,, i - 1,2. ■ ■ ■ #, 

«« . 

Therefore we have the relation, 

(5) X* + ^ S/ktJTfcXi “ SXi 

The rank, R ,, of each 0® ia olivioiialy equal t« tmUy ainre it t» the wjuare of a 
linear form. The rank, i?o, of x’‘ w at lea«t equal Ut n - s f.ine<- the rimk of the 
right hand side of (5) ia n. Also, /?* can not Iw Rreatf'r than n n einee, 

a,k(iki — a,ta,k) =® On “ i t. J -M, •' ■ «, 

=a 0 

gives s independent relations between tlip rows of its ro»'fFident matrix. Thert*- 
fore we have the relation, 

(6) /Bo + /£l + • *' + fij n. 

The two conditions, (5) and (6), are sufficient* mnehtitins fnr x® anti the 0j'« 
each to be independent of the othcra and each Ui be distnbuted h» in (‘hi-square 
with the number of degrees of freedom equal to its rank. 


2. Adjustment of data. The above development is not gmerid ctuntgh for 
many practical problems. We do not always have given data, y,, which are 
normally distributed about a mean aero with unit (or homogrnetiusj varianot*. 
Of course if the means, ihi , and variances, trj, a«* known, wc may make the 
transformation, 


(7) 


Xi “• 


Vf ~ iht 

n 


and apply our theory in a straight forward manner. W« shall now check the 
effect on our analysis it the rfu’s and o-(’8 are determined, in port at from 
our data, the y/s. 

us assume that the Si's of (7) are normally and mdependenlly distributed 
variables about a mean zero and with unit variance. Let us alw define rtjrfcaift 
linear orthogonal functions of the first r of the e/a by 


4>k h/df “ bk/U/iXi A •» 1, 2, ‘ 

=. bk/on i «< X, 2, • ■ - n 

We next form the characteristic function of the joint dktribution of x*, of 
, of ffr+i 8,, and of .jn , • • • . Thia ia 


If Independence of certain ealimatwi of varianc*," Anmk of 

Math Stat., Vol. 9(1038), pp, 46-«6. 



ANALYSIS OF VARIANCE 


37 


Hi, U, Wr+l, ,V,, Wi, •••«),) 

= K J ■■■ / exp [zV + 

+ xj] dXn • •' dxi. 

The conditions (5) and (6) are sufficient^ for there to exist an orthogonal trans¬ 
formation of the x,’s which will convert 

dj to B,, j =- 1, • • • 8, 

X to 

srxj to sre-, 
dV = UdXi to ncffl,. 

The characteristic function then takes the form, 

<1. = A^jril / exp [-K1 ~ 2 iu )( b , - 

{Ilf exp [wl/2il - 2m)ll 
|iIh, / exp [-K1 ~ 2tu)0*]cf^,| 

/ exp [-K1 - 2it)6]]dd}i, 

wliere 

= Swi*, 

since the 1 ja-/s arc orthogonal. 

At the iK'ginning of this section we .stated that we wished in .some way to use 
our data, the y/s, to e.stirnate the m.’a and the o-.’s. A .suitable method Is to 
re.striet tlie. 0 functions, (8), to zero. 

Our problem thu.s reduces to finding the distribution of the "array" in our 
joint di.stribution for which 

01 = 0a == • • ’ = 05 = 0. 

Plxcept for perhaps a con.stant factor, the characterifitic, function of the distri¬ 
bution of such an array is obtained from ‘I* by integrating out the w*’h.“ Tints, 
on performing the integratiou.s, we have, 

* See A. T. Oraig, ihid. 

‘This is easily seen since if one posses from the characteristic function to the joint 
distribution, equates tliOi#i*'s to zero, and then posses back to the characteristic function, 
all the integrations except the above appear in pairs of the form 


which leave «!> unchanged. 



38 


rRANKMN E. SArrERTHWAITE 


4>'(<, U, Vr +1 , • • • , v.) 


iffa - 2iur*^ ^'^iniUin - 2»,» ’'i 

"U 'isiS "'’'f. 


== ■i>,{u)in;+,4-,(o,))<i>40. 

which shows that SI d), fl?+i, ■ • ■ ff! , and x* art" carh indt'iafndmt of th*!- nthK-n* 
and that each is distributed according to the (’hi-squsn' dr»tnb«l)tm wjlh 
r - g, 1, • • • 1, and n — « degrees of freedom respectively, 


3. Numerical application. The developmenie of the pr»'er«<litjg wfi ions have 
been abbreviated to cover technical points alone. \W shall now take a definite 
practical problem and see how we may work ont its wthitinn with the aid of the 
above techniques. 

In Table I are given the Icwes, the cx|)o«ureH fin car yearwl and the rndieati-d 
pure premiums from the Maasachuaetla Statutoir>’ Liability aitlomohile m«i!r- 
ance experience for four towns and for three different rlHAw-« of rfur*. (To 
illustrate the effect of missing items, the data for Utwn I), elstAw HL and for 
town C, class Y, have been omitted.) Our problem w to deU-rmute tf fherr* m a 
significant variation in the indicated pure pitn-nivim ixdwwn the diffi nent towns 
and between the different classes of cars. 

Our first problem is to set up a normally distribnterl variable aUmt a mean 
zero and with homogeneous variance. The true mean, th,, «f the dvxtriHuiion 
of the indicated pure premiums, P*, is unknown, t'mler the hj fs.fhwis that 
the different towns and classes of cars are hornogemsiua with each other, we 
may assume that tlie ifii's are all equal. We may estimate their value hy ttsing 
the combined indicated P for the whole territory, which is Ity a pri^ 

liminary argument, which need not concern us here, we allow that the vwi«tw, 
iri, of ^ indicated pure premium is inversely profxirtional to the expewmm, , 
on which it is based but the constant of proportionality ia unknown. If we 
now make the assumption that the indicated pure premiums are normtdly 
distnbuted, we may convert them to the form 


Xi 


JTl 


which wll he normally distributed about a mean zero with homor<t«H« vari- 

n» the table. 

0ur data, the x,‘« 

are subject to the single homogeneous Imear restriction, 

0 - 2{Li - m) 


XE}" 


Pi~T 

T/^ 




( 9 ) 





AJiAl^YSlB OF VARIANCE 
























40 


FKAHKUN E. SATTERTHWAnE 


The next step is to oxpreas the indiratwl pun* premiunit h'f r«mn .'in4 
for each class of car as ff,’s as defined in eqiiathni 1 1). For A w»' lia-ie uit 
indicated pure premium of $33.21 when all rlam-s nf pm m rfirnf^imd. Thi^ 
breaks down as follows; 

33.21 - - hi.n, 

- ’ 4- 32,41)1 ZF, 

= £(£;|'7£f;,)j, + 32.14. 

Dividing this by the square root of tlm sum of the w|iuire» of th*- rr«*ffjrieHS*, 
we obtain, 

0, - (2£.)''H33.21 “ 32.44), i - 1,4, H. 

= 2(A':7(£A\)"V. , 

which is of the foVm of (1). We have enlertnl fhe etwRieients of exci^pl for 
the common denominator, who«e tainare is entomi on hue (rj) under 

Itestriction (1) in the table. Similarly, we have enU'nsl the vahu-v for tlu* other 
towns under Restrictions j;2), (3), and (4). The values for the rli»,v.eM of rars 
are entered under (5), (6), and (7). 

The next step is to orthogonaliac Ure 0/», The lirnl four have no rnmtiion 
elements so they arc orthogonal by inajyertion. To make orihogoiml U> 0» 
we must add to Ob , 

kn as — £ttrta,i/£ftji 

times 0i. This and similar coefficients for making O* orthtigonai Ut 03 ,$» . ami 0* 
are entered on line (2'). We may now replace Oj by the e<iuivalent by the 
formula 

(11) Art' = n« + kua<i + kjjOa + . 

Similar k’s for 8i are entered on line (3') and 0» is rep1act«i by Sf . $t should be 
ignored since it is algebraically dependent on the other B/s: 

^ a ..j. gj ..p ^ 

Note that on line (4') we have entered 20;; for checking tlie caJculatifm (11). 
We next calculate the Oj's according to the formula, 

Oj ■ . 

Note that for this particular design all the B/s except 0j< and 0«‘ are luimeirically 
equal to the corresponding xfs (enclosed in parentheses). 

Retumng to equation (9), we see that it is equivalent to eittior of the following 
restrictions on the 0,’s: 

sf 01 + EI'*B3 + 0, + « 0 



ANALYSIS OP VARIANCE 


41 


or 



= 0 . 


Tlicrofort' wi* may (’onducle that 

Nf/ir* w ($\ 4- 03 + 08 + ^1)1 ol = 96,469/0-* 

1*5 (linfrilrtit^'d a« is C’hi-*iqiiarf‘ with three* degrees of freedom. Also we may 
eoneludr* that 

iS'l/er] “ (08 + 08 + 0r)/<^* = 79,349/o^, 

is fli.strilnit<*(l as is (’hi-stniare with two degrees of freedom. Note that we have 
not proved, and indeed it i.s not so, that and 5* are independent. 

We have yet Ui obtain our interaction sum of square.^ Equation (5) is of 
assistance, limt' giving, 

» [wX, •” (0* ■f 0* 4" 01 4" 08 "f 06' 4* 06')]/ff* 

395.360 - 173,051 222,309 

W-- - - — 

J 1 

CTjf (7i 

This is disiribiUtHl as is Chi-stnuirc with 10-6-4 degrees of freedom, Also 
it is iiideiM'tident of <S'i and of ,Sj. 

Lastly we fomi the variance ratio.s 


„ 96.409/3 

“ 222,309/4 

,, _ 79,349/2 
* 222,309/4 


» 0.58, 
= 0.71, 


which are not significant. 

We therefore conclude that aa far aa the present data and analy-sis show, we 
have no rewm to kdieve that these three cla8!ie.s of cars and these four towns 
we not all subject to the same true premium rate. 



DISTRIBUTIONS IN STRATIFIED SAMPLING 

By PAt’i- H. ANUEitiioNi 
Unk-mUy nf IHtmtix 

1. Introduction. In tins papiT, dUtribulinufi iif mwti« wui Rl«iii4fsr4 
tions will be derived for random and stratifiwi wimplw. It i*» nut tnxt-»m'y la 
define random sampling hett% for one may find it define'+i in jiny rlenitnuary 
text. If before drawing the sample frtim a {xiiniktirm r, it is, dtvirb'd mfo 
several strata n , n, ■ • ■ , r,, and the sample i: is (’«mp<>m! nf si parsiai samples 
Sj, Ss, • ‘ ■ , S, each drawn with or without replareitu'iit from the sinita; and 
if the sizes m, of the partial samples are proixirtinnale to the M, or cor¬ 
responding strata, i e., m, = kMi , then the sample which I'i olitaifjfxl in this 
manner is a stratified sample. When the si»w of the r«jrlial Maitt},4f?f are not 
proportionate to the sizes of the corresimndingalrata, llie (littnlmlion** of mt^ans 
and standard deviations will differ from the dwtribiitiutix oUtaim'd when the 
sizes of the partial 8ampl(a arc proportionate to the aiimjt of ihi" rorrojwmding 
strata. This will be shown in the sectiona that follow. 

The distributions of means and atandartl deviations frmn wtil-knowti popula¬ 
tions for stratified and random samples will Ix' derivwl and cituifiartHi, aa 
scatter and symmetry. It should be remendwred even though »tr#tification 
has httle to recommend its use, in some ca»«^, over random sampling, tins im¬ 
possibility of obtaining random samples makes its use neressary. Binee ma«t 
of the problems with which the practical stalistieiwi i« confrfuitwl am rtf the 
kind which make random sampling difficult or even impowiblr*, stratified 
sampling is being investigated by many research workers. 


2. The distribution of means and standard deviations for samples of two 
drawn from any population having a continuous frequency function, f,®! fix) 
be a continuous frequency function whose mean is zero, and for a < x < il>, 
let fix) > 0, elsewhere let f{x) = Q. We select a sample of two elementalg! ] 
xj) which can be represented by a point in a square of aide b ~ a, m ixrint P in 
Fig. 1. It is well known that the probability of getting a sample imnl in the 
element of area dxi dxi is f{xi)fixt) dxi dxt . The probability of getting a value. 

the value of the mean roprewnU>d by a [joint on the line 
iCi ^jig. 1 ) whoSfi 6c[UEtion is ici -f- 2iSj is givesu by 


(1) 


dxi J dxifix,)/(xj). 

va 


The distribution of S < J(a + h) is 


( 2 ) 


i»2^o 

2 £ /(*i)/(2f - xi) dxi, 


42 



STRATIFIED HAMPLIXG 


43 


whirls i’' ity •iifl’rn-ntiating (1) w'ith respRct to x. For all values 

» »<■,«• soiM !>**(< anttther (‘(piation whicli we shall now derive similarly. 
Tlv prol-isl-siafy «*f niiiaiuiiiK a mean less than the mean of any point on R'T' 
I Fig. 1 ^ 

dxi I (/xs/(ji)/(xj). 

IHffrrfntialsnK ilii« rM'r<’'wioii, we obtain 

C») fin)fm-x.,)dzu 

Jat-i 


b 

T' 


The tihinbuimii of meaa»< is Kiven by (2) and (3). Let us apply the theorem 
to the rrri4iigtjlar jKipuhitiun 

' L •“! < X < i a ~ b = h 
. ? 0 flwwhert;. 

SnlwslitutinR in >'2! and i3) n*«iK‘oliv(>ly, the results obtained are 

v»2(M-2F), for.J<0, 

® 2(1 - 2J), for 2 > 0. 

J, tt. Imi« |i! Mid I’hilip Hall [2] obtained those results also but by different 
raoUioiis. Hmvi-ver, the diatribution of U was known to Laplace and other 

earlier writers. . , , 

From Fig. 1, it w e«n that the probability of obtaining a value of b (standard 

deviatloni, Irwt thiun the value of S on AB whose equation — xi « 2Si8 



Fin. 1 





44 


PAfl, K. ASbF.KS’nS 


1 — 2 / (Irt ( 

Upon differentiating this exprewiioii with rcs^ft 
(4) hid) «4 J /(j-)/(2.S' 4 

For ttie rectangular population /i(.S') w 4(1 — widf with 

that found by P. R. Rider [3). 

3. Sampling from a rectangular population. lift thr Ffftwnguiar ^-ctf^jlation 
be/(a) = 1, for 0 < X < 1, elsewhere /(x) « D. P'roni tliH fjupululinn we 



select a stratified sample of two elements which is chwn m that {> < xj < | 
and i ^ X 2 < 1. The probability of obtaining a mean hw than the mean of 
any point on the line R'T' (Fig. 2) whose equation is xj + x* 2/, is 


fii-i pU~x, 

^ 1 I ** “ * ■+* 

probability of obtaining a mean Im than Uie mean t^f any fKunt 
on n: i (iig, 2) whose equation is xi + Xj « 2i, is 

? equations wWi rwpeot to 

w stratified samples of two elementa from a 

rectangular distribution function to be 



H'TJi'irH'lKI) 


46 


(fr 


>■/' H 


-• '1, 
I i/, 


f‘'r I £ -f < I, 

fnr ^ X < 1. 


TSl»' 1 • ;*!>'• 

rs-cfaiinular »* r. 


i'lr rflii»l<tm ‘■nuijdi.,- of (wo ok'im-utv'* from the same 


l(tj 


d 


h>r I) < .f <; 

J Sx, for ^ Jr ^ 1. 


I•^•'1 jui'S Wf ilsat 

A. 'i S-* -anji'h' rtr<- jiit.r.- Mafih* than tile random means. 

B I K. T-i?j>h'S.; -aioi.h .mi.i the H-tratificf! wimple riieaiih are both 

ils ■'i n.ne'irs'ally 

f. Tte‘ r.jiijj* <‘1 r,ni'h'»n ineaiiH j*. Ss>,iee the ratiRc of the .alratified 

?);« ‘itJ' 

Is r« iswi!*' le-w "«> },3eS ste fh'*»nhuiion« r.f the -^taiKiard deviations for samples 
of two »‘J< 5o« sp.' (iije I'hieenS 1*1 H'leet»»ti froiii eaeh half of the ixipulation. 

AH |v>3i5' «•},,!!{ i , La\e the }i»un,inr<J jh-viuliou, Furthennore the 
erjwstjen of U? j’ i!>' The prohalnlity of obtaininR a sUmdard 

deiiiafioii !«'.'« fl.fsti *!e- rtatehird deviation of any jsiint on AH (1*%. 2) is 



FurthemeTe, »he jifeh.A.jlpv of fietsiDu » iSatulard deviation leas than the 
standard devi,««}..n o?, the line A'fi Tig. 2} of which llic equation is 

Xj ■ Xj 2-'''. r 

r ,'*o •*, 

1 I .#,•./ *r . -J . KX' + H.S'. 

J 5 ^8 

UtS'efvnfiHijon «d the nuht -hand of the alxtvt* two tHpialions with respect 
to ,*i jii-hb the ds-ilrdiiiJion *4 «t«i>hir<i fievialiiin.H of stratifieal .samples of two 
ek’RW’iii^ Si fe^ian^utar ds«tnh»lson functiim to la* 


'• WA*. for 0 < A' < i, 

1*1 AI A' > 

*- K ffi.'v, for I < S < 1. 

The 4ir4ril'HiS»ow *4 ih*- RiAudard tlevtulteini! for random sai»iil(t« of two elements is 

(81 h:Sf - -VI ■ 2M. fort) < .'f < f. 

From <7 1 and iHi »t in t-Mily fwtt that. 

A, '11k* nitjfe* rtf thiy »h(uwkxd vlrvkiion* for and random samples 

lA ll«‘ Mlsie, 

H. Ilir t}»4int«Mtiwn of rtattflard dkviatioiiB for random samples of two 
t*h-»orttto hi »kvw 0 i, t>«l tlw diatributiem of the standard deviatioM for 
stratifeed of two efemeoto » symmetrical. 



46 


PAt"I. H. ANDMlftOX 


If we take a random sample of two elementu fr«nit ll«* rrrt Mipjlar puinilatinn on 
theinterval< X < i, then Studfnt’H ratio t M ^ Ms -if 

will have the distribution 


- 1/2 i - 1 / 

m 

= l/2(t + 1)* 


for I < 0, 
for t fL 0, 


This result was ohtainwl bj' Laderman (7] and others. Ar«»r<!tng to the rowm- 
ing used by Laderman, the probability of getting a value of f P's.", than the value 
on OS is (for stratified sanvples of tuo eletnenbj) 


-0 

4 j ^ dll j dri «■ 


~§(t + u a -™ n. 


When i > 0, the probability of obtaining a value of I gn-nter than the value on 
05 is 


4 


f'dzjf dxi. 


It follows easily that the probability of getting a value of t Imw than the value 
represented on OS is for stralified samples etpial lo 

1 - 4 f dxt [ 1 + 1(1 - Ji/(i -b II 

Jo J*,o-i)/<»+u 

Differentiating the rightz-hand aide of the first and ihitxl alsive ecjuatiunit with 
respect to i, we find the distribution of Student’s ratio for slrsUitied wunph's of 
two elements from a rectangular population to Ih> 

* l/(£ - 1)’, for-l<(<0, 

= l/(t + 1)*, for 0 ^ t < L 


Comparing the random sample and stratified sample dislributiotet of 1, we 
find that 

A The stratified t’s are more stable than the random I’s. 

B. Both distributions are symmetrical. 


C. The range for the stratified f’s is “1 ;< 1 < + 1, white the range for 
the i’s obtained from random samples is ”■ «> 1 < +> «. 

By means of a different method, distributions of means of stratifietl samplw 
will be obtained. Let (A) and (B) bo rectangular popukliom fU), f(y) ra- 
speotvvely, with positive values on the interval 0, 1. From the rectangular 
population (A) select a stratified sample of two elements Xi and X| such that 
1 " ^ ^ Then the probability of getting asample point in the 

element of area dx, dxj is 4 dxi dx,. Now let yi = 2*i (change of unit of raeaaure- 
mentj - 2 x 2 - 1 (change of unit of measurement and translation). Then 
4c(xidxj - dy-idyi . We have also that 0 < j/i < 1, 0 ^ ^ ^ 



i’TfmirtJh f' 4 wi.!xr! 


47 


tl.*' •■( *h*‘ nv-Hn^ a ‘.tratifiwl fsainplc of two elements from 

fA'; n- Slip H r^-sn'Snrtj f.-miplp »{ ^wr* from (B). Xow the means 

for ratj4« ■> >4 tu.. from 'Bi have the distrihution g{^) which 

^ Sa., y *i’3h*iffor Jt, Kurthenntire, we have 

fj i ■•-! ■* '-!■ ' * li 2f I Hence it follows readily that 

j?.' S'; ' 1**'''' H"'? 1 '■ “ ' 5- SP' ' ■' ■" 12 ■ IHir for ^ i 

hfiofsi sh*' rif|«»sejla!i'ain 'As, take a htfatified stunplc. of tliree elc- 

rriwiSfti ' Ji - i, I ’ rj ' 1.1 *" /, ' 1 . The sample iKtints will all lie in a 

W5tl.s!s ’h»' lijijt fsilse ’!’h<-t» the jfrmhaViiility of Retting a sample point in 
lh«' s-h-nieti'' * f ''dujHr dsTsdrs*, i>t 2T . Now let yi * 3ii, yj == 

Sr* - l.Vsi ' 2 Jh«refore ri y y, 5 ’ J, for i »■ 1,2,3. Furthermore, 

dfsdiodv, " 27 <j,f, i-jr, <f?j W Bh n -.j*'*-! to the distnhution of the means, a 

stratitsfd "ssniS’’*' of ti.r*-*' e|ejaiej,«s from 'At in the ««ne iiK a random sample of 
Sbrne lA' ft'os”, ft X-rm the fctr randuin .•sample of three elements 
fr«t« 'B'l h:i‘*r' ‘h> d) Srd'iwtn n 


27'/. 2, 

fer t) < s? < i. 

if V ' ' ",0 tr-j , 1 J ..o. 

fer ^ ^ < 1, 

27'! yd, 2. 

for 1 < 1 < L 

i.,,1 y — .'l.ii ! Tory *. 0,4,1,1,1 

i 1,1,1. rwjM'ctively. Hence 

Hlut^ n* 2. 

for ) < 1 < i 

tf'r, • ■'2T.iAj^ Aj/ . I3V2, 

for 4 1 «. 

Ht,i2 3i’/ 2. 

for 1 < 1 < 1. 


Thw we 1,114 r' fnumi the dftrilnitasn of the i!wan« for alratifiMl satnplea of three 
ekwnt'* when ehirss'nt seleri*-*! from oarh third of the tmpnlation. 

Fmm th'" r* rt^jngoluT j»’<jrt>latioin lA*, take a wlralified wampli* of four elements 
0 S ?' 1*1 I' I ”■ 7' I-1 1- Again, a Htratified aample of 

four t‘kioeni> fr<i*ns ■'Ai rymjs'ei t,* the di»lriUution of nieaiis) is the same 
« ft raftrh»jH nmis}4e ^4 four rlrriMitA from tHh The means for mndom samples 
of four eiefttent^ fn»ni "Bji tiftve the dfttribulion; 

forO<#<i 


<C‘| 0im *• 


Sinew §»~ a 
If. Hifttw 


•«ii ■■ 2V it ' ifiip 
H|| . - 1/ i» iWS 

'i»,i II*,. a. 

' I, «o jwatvr for I 


§)*j/3, for i < I ^ I, 

4)1/3. for h<§£h 

for I S ff ^ 

0, i, I. i 1 iwr»»'iivpiy* 1« i. A. h A, 


■.,‘5i2,.4i - 4t\a. 

dan ” -iieil - 2f - mu - 2)1/3, 
'3t|l ■' S4tii' ™ af + 48141 ~ 2)1/3, 

'^srifi 4i + D'Vs, 


for I < 1 5i A. 

for A ^ ^ i 
for I < 1 < A> 
forA <2 



48 


PAT- 1 . H, A'SnrHHfW 


This is the clLstrihution of the meanH for famnk of fonr ;ow 

element from eaeh tiuarlile). We ran rstmnl thi« '•fra’ito-*! of fn»»t 

n where one element Is selerterl from rrmh ond flo-r*- rsio '*tr;sta, A'» 

n increases, we note that 

A. The range of the means deereaw-** 

B. The scatter of the means 

C. The number of area in the rUstributitm of tlt«' meatea im-restvi'i., 

Take the atratified saraiile of four eleitienl*. >fwo fTfun er*rb baif j, 

0 3:1 ^ 0 < i'a < I < Xa 1» I ra J'* 1- With t«* ih,- sli.4!rstHj» 

tion of the meana, aatratified aamiile of four eletnen!* from /Ai i>. the hnnw m 
a random sample of four elements from (B|, Now lie- m«-fsns for random 
samples of four elements from (H) have the dbtriUufion li't. Furthrrmon* 
5 = 2x - i djf = 2dx, and fory « 0. i, ^ 1. x J. |. | Thu-. 


'256(2x - i)V8, 

16[1 - 2.1(2f - if ~ - Ifj/d, 

LG[1 - 24(2i‘ - if + 48(2^ - lf|B. 
250(1 - 2l)V3, 


for t '■ ^ i 

for I x < §. 

for I ^ 'I f, 

for I •' St 1 . 


Hence the distribution of the means for sfralifii-d Munple of four .d« tm nt-. rtwo 
elements from each half) has Imen found. 

If we take a stratified sample of six elementa (thre*- fr*un rarh lutlf j 

we find that the graph of the, distrihulion furietion of the rm-juif. will eoiwint of 
six arcs; the range will be i <_ f < |. TUuk we m* that le* wr* l«k.- more ele¬ 
ments from each half, the diatrihution iH-eotnes smtHtther. 'rite iiumlH<r of 
arcs in the distribution of the means also inereases. The nuigo *»f the »iean>, 
remains the same but scatter decrcaaejs ns we take mon* elejuiuitj-. from each 
half of the population (A). 

The results so far obtained are true for the metanguiar ixtptilutjicm which is 
symmetric. In order to make further compariaon.s in (he dislrihuuuriH of means 
and standard deviations for stratified and random samples, let us now conHidcr 

0 Cilr/iTTTrtW y-li ^ ^ * 'Ti* 


{ i “ ♦ r ° -Z - ^r ° «'sewhere. If wo lako random wtnplw af two 
e e^ente from this population, the pointa represonUal by each sample will He 



KTI{\TinKIi HAMPLI.N'G 


49 


jMTpijlatinn 'Hif r*thfr is arlccfcd frf>m tlif! raugf (^ < xa < 1) which 

ndi-tifiHr- qiijirtora r>f thp total iKipiilation. By use of the geometric 
mcHitMl ihr ih-frihuftfoi of flu* .■‘tratified tncana is found to be 

Ii’o32x* — (*f }- 1)/1), for i < J < I, 

ffix) 

- ir.iJtO/ - 9 - ;i 2 i’)/ 9 , few \ <£ < 1 . 

The ratiRo of lh(‘ utralifictl mefi,n« ia lt'«ss, and the distrilmtion is more nearly 
symmotnral than 5l is fur the rannhun means as may be seen l)y comparing the 
grajihf. of tlje two diBribution functions, Tims wc mw, that stratification gives 
the niestii- greater slaliility. The distrilmtion of tlie standard deviations of the 
hiratilied samjdes of t\Vf> elements is: 

(VHd.S - for 0 < h' < i 

/iush 

- 12Kt4.S'’ ~ fbV + 1)/1), for \< 8 < I 

l“|wm romparuig (he distributions of tin* standard deviations for random and 
sirafilsed -aniples, we (deserve that the random ease yields a single cubic whereas 
the stratitn-ii r'a>e yields two rubies. The distrilmtion obtained for the slratilied 
caw i-' more \vu«irii*friral tlian it is for the ramiom case as may be .seen by 
skelrlnng the gniph*- of the two distribution function.s. The range for botli 
distributions is the same. 

rt. Sampling from a normal population. We shall consider a normal popula¬ 
tion F having the freiiueucy function c *'V\/2ff, (— » < J '»)) and the ffch 
inoincnl aismt the mean will he n,, Divide this fKipulatiou into two cfiiml 
purls Ft und Ft such that the freijueney function of Fi is 2c *'Vv/2ir, (- "^ < 
jr < D), and the frequeney function of /■* is 2c'*'^V's/27r, (0 < x < =c), The 
ftli moment of Ft aUrnl the origin will Iw npi, wiiile tlie ftli moment about Us 
mean will !«• no I the corresiHinding ith moments for Ft will he ni,a and n,t re- 
.spcetively. In what follow.s M[ will be the illi moment about the origin of the 
distribution .sought, wliile .1/, will 1 k; the illi moment about the mean. Further¬ 
more, the coustaiil.s #ii , A , k, (measure of .skewnass) whicli will be used here 
are defined in KlderUm (Hj. Finally, B[f{x)] will he the expected value of/(j), 
If we take a ramiom sample of n elements xi , , > ■ • , Xn from ?\ and a 

random snmiile of n elements ‘ from /’’*, the 2n elements Xi, ■ • ■ , 

x«, x«,i, . xj« will be a stratUied sample, from the population F. Lei 

Pi fi« 

h ™ (I/n) «■ il/n) T, •«». a-t“l -8 « l(>«i + *): then £ will be the 

mean of the stratified sample. By using Tchouproft’s [0] formulae and expected 
vtduas, we obtain the following values; 

M' *= E{£) *= i(ain + mn) = 0, 

Afi = E{£^) = (g*i -h M9s)/4n = (1 —• wiis)/2n, 



50 


PAt'h H. anormwin; 


iV, = E(i*) - (m + - 0. 

Mi = E{±*) = {mi + (143 + lin(fiii + *4' (*»'<! 

fi, = = 0, A = Ml .yl - 3 •! i’-x - 3; r/«> - 2;*. 


From these constants, we sw that thf Y.'jriafif'/' nf tiw wtrafilw-i! mrafifs 5« 
(1 — 2/ir)/2n., but the variance trf ran<Uim mrans of 2n r** 1 2n m k 

well-known. Thus it ia obvious that the scatter ctf the strtlihefl 
than the scatter of the mndom means. Furlhermim*. the sirstjjie4{ mean* art* 
distributed symmetrically since M» « 0. <)l«»crving A, %<• notice ifjtt ihp 
distribution of the stratified means is slightly more ihtm tionnal. Miiw* 

it is Avell known that random means from a nonnal fWfpvdaliort are notrnally 
distributed, the differences Iretwrtm the two distribulitm# are easy t*> sec. A* 
n -t <o, ft -* 3, so it is reasonably likely tliat the Rlratitif'd tneasiR tend to lk« 
normally distributed as the size, of the sample irirrcaww. 

If we select a random sample (x, p) of two elements frtnn the norwml p»p«- 
lation E, then the variance O'f) will Ik': 

S* = + y’) “ (ar + yf/4 * (x yf,’4. 

The method of expectations gives m the following values: 

M} => -J, Mj *» 1, Ml » ->1 ^, 

ft-8, ft. 16, 

Therefore, the skewness of this distribution as measured by Elderlon'a formula 
is 1.414. For a stratified sample, where we aelcct z from Fj and y from Ft, 
the second, third, and fourth central moments of .S* are: 

Mt = (r* + 2ir + 2)/2tt\ 


M, = (4ir’ + 7ir‘ - 12x + 8)/43r’, 

Ml = (15ir' + 30ir’ - 4(hr* + 24ir - 12)/4)r^ 

It follows easily that ft - 4.71084, ft « 10.28^189, x « 10.4, 2ft » 3ft » fi - 
.438324, iS, - 1,02. For samples of two elements, the stratified samplw yield 
a ftstnbution for the variance which is loss skewed than Ute comwoonding 
distribution of the variances for random samples. The vaiianca for mndom 
samples of two elemente are distributed as a Type HI curve, white the variance 
for stratified samples of two elemente is either a Type HI or a Type VI curve. 

atratified emm m mtn from 

this pomt of view is not clear cut. 

thf-im^Wrn!^V° bi'is 18 introduced by taking n elementa of 

the .ample from Ei and by taking 2n elemente of the sample from Ft . Under 
these circumstanees, the complete sample will contain 3n elemente, and the mean 



STRATiriKD 8AMPUXG 


51 


of thf Fainrtlf* will Iw 1 « ^ « f^, + 2f,)/3. As before, the central 

moment#! arirl the *ffs ait* found lo lie: 

3/j iun + 2p«)/9n pn/^n, jV/j == (fiji + 2pji)/27n’‘ = Mw/9n®, 

« lu« - iVn + 9npU/27n\ 

di , fk " f44/3«MM — I/n + 3. 

W(‘ notice firnt that the mesius an* not symmetrically distributed for small values 
of n since \ (1, hut as n «, di 0, so the means tend to be symmetrically 
distrihutctl. It is evident ate that ft —»■ 3 with increasing n; consequently, 
the bias which is prm‘nt for small values of n tends to disappear as p increases. 
Incorrect jiroiKirtioning of the sites of the partial sample.s in stratified .sampling 
intmcluces an erntr into the results whose magnitude decreases with an in- 
ereast* in n. 


R. Sampling from a population y = .^(x). Suppo.sR wc have a well-behaved 
frequeney funefion 4>(x) of which the first four moments are finite. Fiirther- 
mon*. it will l*e required that <i)(x) be continuous and lliemann-inU'grable. 
Divide the total r-axis into K parts It, h, ' •' , h with the separating points 

to . os ( * * ■ • to 1 ifi filch manner that / ^(x) dx = • • • » I iplx) dx = \/K. 

J-SCf ''ffJt-l 

In this w^Plion, wo extend some of the, definitions of the lost section; g.t will he 
the itli moment almut the mean of the. ttli part h , and Wk will be the. rth moment 
alKiUt the origin of the ftli part I,, Take a sample of Kn elements from this 
population »tj that n elements are, drawn from each part. The mean of this 

Xh k 

sample will la* f » ^X{/Kn, We write this as i where 

|wl 

*» I 

i, »=• T! xJn. It follows easily then that; 

«»- M f I 


A/j ta p),fK^n, Mt “ 2 


Mi 


,1**l / •►'•I Jf 


as n <», 01 0, and ft 3. Thert'fort', it is evident that .if we divide a 

population into K equal imrts and take a sample of Kn olemcmto (a elements 
from each part), the distribution of the means probably tends to normal as the 
number of elemeote in the sample increases. 


7. Summary. Distributions of means for stratified samples have been ob¬ 
tained for the rectangular population which is symmetric and also for a triangular 
population which can be considered an example of a /-shaped population. For— 

. • ’.. 'h 



62 

I^AVL 11. 

both popu]a,|jgjjg i| ^ 

ability than thn' ^ f^I^lainni fnnu 4.<.wi ]ik, varj- 

tained from the hT.'^!*V*^ J'ainitbA. Th f MraljJfd }'"Mi<j‘]»' ir.*'fMr< tif*" 

„»—1. ^''KKHihitIn(;..w Lfi.u. f5iM) 4” Sh*’ 


tained from the J'ainitbA. Th«' 

sample means nbt fxifil'it Jw f'lipw ww* 

The effect of sS i«Tuiatirm 

deviations is to mnlf ^ wmprmg ih f' »liH!rj}'«i^:"’n ‘'f ’I;*' '‘?^nid«4 

three Domilntift,,,. ^ dislrilnitinn rmtrv •‘vnonfUr Tin** if* l<»r thr 

f’oi' fatratified wilmf 

much more stabl f( . . the nalanplsr Stsj.hi.rH i-atji, 

Thus it is evidenM*^ *** t'anipicH of Jim )*,«!»' *w-< 

samples of a natur *u aamplw |»»A«r'.v raprium 

where it is poo,, i , tiiakes HtratifuHl aamjiJf■(< Morthv of u**' m in'« ar*'h ^^(rk 

tocondSX 

gesting the problem T* ■ ^ *** Rratvful to Prtifrs?tf.r A, H. t r»<h'<rn*' im '*ur* 

^ this iHip{<r and guiding it U» it,-* 

(lU.OjKwm.uo,^,^^^ ^IKFHlKNtT^ 

having any diatriimiinji (d fhi' iiM'sirtui paropl'-o {» ?% ^ 

Icon's't'y,," jf,X^‘'l«''nt'y '*'•4* hniO' flUntMrH#, Sjwfjv] Irfi'fy )trp Pi 

[21 PhiupIIal^ "Thcdig/V ^hhmdnltt, 

in which the va • ‘'tftipatw (nraaiiiplp^ t»f Pur .V «!«*». Ir fo ^ 3‘'>|niilnij'f|ii 

Pmbfthlo,'' »,■ ^ values Ih^IViwo f) snf) I,all wrp VabK'i Wiiijf cmsflllv 

13] Paoi. It, hP 

sampler f ^ ^ '®*'*^huliqn of the ratio ot ll*'" nvao i** rjfVsBiimji ?« 

124-143, non-nonnal iinivcrefB,'' Ym! 2| 4V7*»*, w» 

(4) J. Neyman, "On 

- VJU tho tWo fiiff 

.r, 1 V, “““•’^our.Vni ‘"''•■'’'nt Mpcfls cif lb mprr»fi!ii})w Xmr, 

fill j 1 ^^1*26), pp 321 in/®®®'''®®ndsqu&wlataf)slani dpviftiK'W,'" fc'f’n'lfBiit. Yr4 H 

i«i A A. 

ivi T T^*^*’*^*hntionB’‘ R‘ * ^ntlicttiaticBl pxpprtainm of tb rtifsttwita sif frictiJiBnry 
in JackLaderw^j^^ "The 12 IlDlK 59), pp 54(1 l(t9. U UIO 
isi w T) '^^^'‘■noviiiai **/*hution of Studenl'a ratio for tsaitipka ni ifaw i<ie(m drawn 
1 ■ -rALiM Elderton ^^'’‘verses" Anmko/Malh. $ta !, Vtd 10'IWJJ,}«ji M Mi 

*^yton, i 92 y ^arPM and Ccfwtown, bndori dwrlpi an*! Jilwin 

Edition)^ pp. 239. 



«;nTDTION OF A MATHEMATICAL PROBLEM CONNECTED WITH 
THE THEORY OF HEREDITY' 

By H. Bkrkhtkin 


TKAJitn-AtKi) «Y Emma Lkhnkr 
t'jiP f>f Cnlifnrnittf licrkclcy 

TnindnUfr';- a Fn-ncli rc‘mun 6 of tlu' articU' liere translated 

nxm‘iiv .i m fit i« due te space restrirtions that in 

rlil.rfum B'-riwlein’e work for the Btalistical Seminar at the Hnivemlty of 
C’ahf.iniia, it nr*ceA«arj- to refer to the oriRmal Russian paper, Itee.ause 

nf ih.' ohvicm. lanmuiRe diffieuUy (ogetln'r wdh the extreme rareness of the 
fkrmi.ian snihlu-atinn in this eountiy. mid !x-oau«> of the curieiit interest in 
Ihe nwl^namx of stet.-^tiral tlu-ories to genetie.s, 1 seeuied advrsalile to make 
,h,. inite-rtent artiele availahle to u mneh larfier ek^ of leaders. _ 

It .H r.*En'H..l llmt. due to the pmeeiit eomhtioii.s it was inipraoticable to 
ohiain th. author'*^ romnirnte on this Irmislation, and it ishoiKid that the slight 
rhange. and uddition.s inn-rted, to elarify some of the more diffie.ult pas.sages, 
would hiiv.- turf with hiH approval. 

1 I ,.1 tiH rotihider S ehws of individuals whieh possess the property that 
.1 ’ ‘ two uf th(‘se individuals produees an individual belonginK to 

[!! thill the iindmlulitv «f obtaining an nulividual of class j 

We will supiKtse o > tlud thi 

M l n»u " ' ^ * II ,1 "luTnlily ttirlliciaits ot a given 

I*;,;;;"' 'u t™, u..- •.( 

.V 

(1) 

U*t a !«• the prohahilily that an individual lHdimg.s to class j, then under pan¬ 
mixia* the probability of iHdonging to ehuss J m the next goiierat.on is given by 


( 2 ) 


«! 




1 „ ..nUUslirtt in the dwrwfM iS'denti/if/wM de I'Ukraine, Vol. 1 (192'1), 

»The oriKiiial ww jnililiwueo m o» 

n tyi’ll't. _ , I I A2S-5!$1 BHl wt. 

) R, Ac. iSie., ' ihiivershy Malhemstiesl Libmry tor their loan of this 

s Tlmnkn are due to the Itrowa umver». y 

of Murse. the relative prohabiUly that an ofTspring belong to clans i, given 

tlml the parent* belong to °’***f® 

‘ That is, complete absence of selecUon. 

5S 



54 


H. BKRNSTKJN 


Similarly we have for (lie next BPiieration 
(3) a" » T*. A{i,n. at 

ii,k 

and so on. 

The problem which we now profKW i« oa follow^' 

For wlial heredity cocjiciente umlrr panmtrm wtU (hr <</ 

achieved in the secontf generation remain uruilirrM m of/ mihf^npmuf ynuritfuim? 

If the heredity coefficients wntWy thin rmnlitirm, then tin- Isw of herwijfy 
which correapondfi to them in called "stable/’ 

2. We prove first of all that (he Mendriian law in 'IIm* Memtehan law 

has to do wdth three classes of individuals, the first two r»f which sm pure raee^, 
while the third is a hybrid rare .such that the rnnw of an iiidividiml of the 
class with an individual of the second eloHS afirayjs produce,** an iridsvidual uf the 
third class. It follow.s theitdoiro that 

All “ /In = /li> “ 1, while 

All *= Ati “ .dll “ B® dl* »« d|f 0, 

The remaining 9 coefficients are defined as follows: 

dll = djj ■=> dfi »= dij «» d» =® 1 /2 

djj « da 1/4, while di» »= dw 0. 

If, for simplicity, we denote the probabilities of IxdonRiriR to the fir^t, wwirid 
or third class by a, 0, y, then (2) becomes 

(4) a' = (a + Jy)* j3' =. (;g 4- Jy)” y' » 2(« + fth 

while on iteration we get the equivalent of (3), nonrcly 

a" = [(a + h)’ + (a + ^>0? + iy)l’ = (a + h)V + jJ + y)® 

(6) 0 " = 1(0 + Jy)’ + (a -h Jy)((a + ■§y))® » (j9 + iy)®(tt +■ + yf 

y" = 2[(« -p Jy)'* -P (a: 4- f)r)(/3 -j- iy)][(j3 HP |y)^ -f (« 4 . jy)(0 4 . |y)j 
= 2(0 4- iy)(0 4- ^)(a 4 - JS + y)*. 

Since a4-/3 + y==l, it follows that «" « a', 0" « a', y” * y' and hcriw 
the Mendelian law is stable, 

3. The first rather important result can be stated as follows; 

Theorem: If three classes form a closed biotype under a skdble /terw/iYn tow, 
which IS wch that the cross of an individual of the first ckm milk an indipMual of 
the second class always produces an individual of the third eUm. then the. first two 
classes represent pure races and the law of heredity is the Menddian law. 



MATHF.VATIfAl. PHOBLEAJ OF HEREDITY 


55 


If the nrifiiiia! probabiliiiff* arr* a, jti, y, tlicn the corrcKponding probabilities 
for the next Rt noration ran lx- written a*; followa: 

ffi .1 ii« i 2Aiiad -b .'{jj// + 2AnO‘y + 2A^y + = /(a, / 3 , 7 ), 

bi) lit - 2 /fijnr|f -4- + 2Iit)ffy * 1 * 2 /fsa^y + = ^(a, 7 ), 

7 , Cm^ i- 2(\,o,0 4 . Cru'f + 2 fW + + CV “ 0, 7 ), 

when* .i,fc A /f,» I, Sinre f'u « 1, by ftatnimption, it follows that 

Bn “ jilt »= 0 , 

since all flic wcjTiricrits inusl Ik* [wsitivp, or zero. 

The matbcinatical problem before us consists in determining three homoge- 
nnuH quadratic forms /, -p, and V' in a, 0, 7 with non-negative coefficients such 
that 

/ + <p + ^~l~(a + i9 + 7)’ 
and satisfying the conditions of stability, namely 

f(»i , 01 , Ti) = /(a, 0 , 7) = «i, 

(7) vfai, 01 , 7i) =“ v>(«. 0, 7 ) * /3|, 

i(a] , 01,7 ») “ v5'{«t 0, y) “ 7i. 

for alt a, 0 , 7 mucIi that* a 4 - d + 7 “ !• The third equation is, of course, a 
consetiucnee of Uu* first two. 

The functions /, v® and licing continuous, assume infinitely many values, 
unlcmt they are constants, in which case they may be expressed as quadratic 
forms iiy 

/ a» p(a + 0 -f 7 )^, Ip * g(a + 0 A- 7 )*> il* *» r(a -f- /9 4“ 7 )^ 

where p, q, r are constants. But, since Uie coefficient of a0 is zero in / and ip, 
and 1 in 1 ^, this rcHluees to the trivia! case / » 0 ,»» = 0 and ^ = 1 , which we 
can neglect. 

We now w'ritc (7) in the form 



m “ /(a. 0, 7) 

“ otfa 

4 - 

0 

Hh 

7) 4- 

Bi(oi, 0, 7 ) 


( 8 ) 

01 “ <o(«, 0, 7 ) 

an 0(a 

4 " 

0 

+ 

7) 4 * 

Fi{«, 0, 7) 



7 i “ 01 7) 

su y(a 

4 - 

0 

+ 

7) ~ 

Fi(a.0,y) - 

Fi(oi, 0,y), 

Since 









( 9 ) 

;(a, 0, 7 ) «« «(« 4 * 0 

4 - 

7) 

) 

ip(a 

t, 0, 7 ) « 0{a 

4 * d 4- 7)) 


0 , 7) » 7(« 4 - d 4 - 7)1 


• The fact that the variables «, 0, r are not independent does not preclude the validity 
of identifying their coeffioiente in the equaUone that follow, einoe all these equations are 
homogeneous. 



66 


K. Rp:ilN'rtTEl.\' 


are obviously aolutioiis of (7), it followa that Fi(f, ~ 0 atifi /V/, v". ■' 0. 

But, as wc have just aceu,/, and \p aasunif iftlinilcly iiiaiiv value-*. TJirreforr 
Fx and either have a linear factor in eornmoti. or cK* an* |irosw*»r»ionfil anil 
irreducible ’’ 

AVe first show that F\ and Fi do not liave only a linear faetor, h* i mii b ny , 
in common, for if they did this factor would vani**!! for n - f, ti - -- 4> 

so that 

( 10 ) l/(«, 1 ?, 7 ) -f- inipiat, 0, y) -f n^uit, 0, 7 I < 1 . 

But since neither / nor ip have a term in n0, while ^ has, n - 0, Also, since 
/ and <p have no negative coefficients, I and in are of oiiiKisifo signs, !,<*} } fj (), 
while m == —y, where p S 0. Then the third equation fH) ran Ik* vvri((.f'n 

(11) ^(a, /9, 7) = 7(0; + j9 + 7) + fA« -h W 4 ("yUln -• piiu 

The coefficients of a’ and 0^ in ^ must Ik*, non-negative. Therefore it follows 
that A > 0, while B < 0, But the eoeffiraent of ad in is 2, while HI Ap 
cannot be positive. Therefore Fi and fj him* no linear factor in eoimuon. and 
must bo proportional. But since, the eoefficient of n0 in / lunl y i*i /.luo, the 
coefficient of a0 in both Fi and A must Ik* - 1, and therefore /■', and /■*, rm* 
equal and wc can write Fi « Fi - F, and (8) Iwcomc-s; 

/(«. 0, 7 ) = a(a 4- d 4- 7) 4- F(a, 0, y) 

(12) ¥’(«, 0, 7) = ^(a 4 d + 7) + F(ix, 0, 71 

Ha, di 7) - 7(« 4- d 4- 7) “ 2F{a, 0, 7), 

where F is an irreducible, homogeneous, quadratic form in a. d, 7, Inirther- 
morc, the coefficient of in F mu.st be zero, since wen* it |Kwitive, the eoefficietit 
of a in / would exceed 1, and were it negative, the eoeffieit'iit of in v* would 
also be negative, which is impossible. Similarly, the ewffieient of if in F is 
also zero We can therefore write F in the form 

(13) F{a, 0, 7) = —a0 4- cc^ 4 - d0y + ey^. 

Moreover, we know that 

(14) F{a\ 0', 7') = F{aS -h F, j3*S' 4- F, yH ^ 2/*') « 0, 

for all values of a, 0, 7, such that a-Ti94‘7®=’ 1, Kxjiiuiding (14) in 

Taylor series about the point (a,S, 0S, yS) in three Hpace. we, got only tluTc terms 
in the expansion, since all the derivatives of order greater than the wcond are 
identically zero, and the constant term can be obtained very Himpty by i ml ling 
^ d — 7 = 0 in F{aS F, 0S F,yS — 2F ). In this way we have 

' See BCoher, InlToduclion lo Higher Algebra, p. 210, Theorem 3. 



MATHKMATICM, J'HOHW.M OF HKKLDITY 


57 


(151 F(aS + F, »^.S* + F, yS - 2F) 

r,» p(V»,S', ^tV, yS) 

+ F|FlifSfiS‘, Y<S) -f F'gfaF, (iF,yS) — 2F'^(niS, 0ti,yS)] 

4' F(F. F, ~2F) 0* 

F if. u hcmii(fK»*nMUw form of the wfond dpgnH', 
ar.) FutS, (JF. tF') a, yy, F{F, F, -2F) » l, -2), 

while it« derivative with rep«*t to a, 5, y an* iKiriiogeiKHiUH linear fnmw ko that 
(17) F'jnF, 0F, yS) ~ iS5’ltcir, 0, y). 

Hiil».titutinR thew in fI5j and dividing uut an F(a, 0, y), which is not idenficttlly 
M‘ni, we get 

(IH) .V* 4‘ FjFlfcr. d, y) f F'iin, 0, y) ~ 'iF^u, 0. y)\ 

~F(ct,0,y)F(l, 1. -2). 

ihit ainee Fla, 0, y) iw irretineilde, F(I, 1, “2) munt ia* saw. Dividing by F 
we finally get 

(lu) .V« 2 f; - Fi ~ f ; 

or la + 0 4- 7 ) « 2{f<r 4“ (10 + 2ey) - (-0 + cy) - (~ot 4’ dv) from which 
it followx that 

e a d ta 0, r ca 1/4, 

and hence 

(20) F(a. 0, y) « 7 V 4 “ a0. 

Thereftm* we have found that 

/(«» 0i y) ** *(<* + d 4- 7 ) 4- 7 V 4 - 00 « 4- or + 7*/*! “ (a 4^ i7)*i 

(21) pia, 0, y) ^ 0ia + 0 + y) + y'fi ~ a0 - ^ 4- 07 4“ 7V‘i * (0 4- |7)\ 

f (o» 0i 7 ) “ 7 (“ 4* 0 4- 7 ) “ I'r’ + 2o0 “= 2(o 4* i7)(0 + § 7 ). 

which ia the Moadelian law, 

4. We have therefowi shown that Uie Mendelian law is a necessary conae- 
quenee of any stable law, which is such that the crcms of tlio first two classes 
produces the tliird hybrid class. We have not even aasumed a prion tliat the 
first two classes are pur© races. From a theoretical point of view it is interesting 
to investigate the powibility of crossing two pure races under different laws of 
heredity, which are nevertheless stable, ^ 

We will therefore suppose to start with that the coefficients of a in/(o, 0. 7 ) 



58 


S. HERN'RTBIN 


and of |3^ in <p{a, /3, y) are equal to unity. BeRinninK n-ith equafiarm t-Sl of the 
previous section, which merely c.\prcfv*< the condititm that the her(‘rlity law under 
consideration be stable we can write 

(22) Fi = F = — aap + r«Y + d0y d . 

As before, Fi and Ft cannot have a linear factor in common, henrr fliey are 
pi’oportional, and we can write Ft = 'SF, Fi » F. We theredtut* have (h'c 
coefEcients to determine; a, c, d, e, and X. Since’ 

(23) FiaS d- F, 0S + XF, yS ~ (X + 1)F) 0 

we have an analogue of (19) 

(24) ^ = (1 + x)f; - Fi - xf; 

or 

a + |9 + 7 = (1 + X)(cq: + d/3 + 2rj') 4- Ojtf -- cy H- X(i« — Xdy. 
Equating coefficients of a, jS, y as Ircfore wo have 
1 = 0(1 + X) + aX or c = (1 - Xa)/(1 + X), 

1 = d(l + X) + a or d = (1 — a)/(l + X), 

1 = 2fl(l + X) - c - Xd * 2e(l + X) - (1 ~ Xo)/(l + X) ~ (1 - n)/(l + X), 

or 

0 = (1 + X - aX)/(l + X)*. 

Therefore the most general quadratic form F satisfying our conditions can lx; 
written 

(26) F(«, |3, t) = ay + ~~ fiy + T*. 

If we let aX = h, this becomes 

(27) F{a, t) = -aafi + ^-^aay + a^7 + «7*. 

Substituting this value of F into /, </>, \l/ and simplifying, wo get 

(28) /(«, p, y) = (a ^ [« + (^ “ + (^ “ 5^) y]* 

,(«. P, y) = (^ + „-^,7)[(l - 5)a + ^ + (l » y], 

^ia, /3, y) = (a + h) (a + ^^y) (^0 + y), 



rmUUVM OP HEREDIXy 


59 


whfrc in nr4*‘r that all thr Ix' jxwitivp it ia HfTRimry and Huffieicnt 

that n < a < J, and 0 < h < i. In pum* a •--> h »= 1 formulae (28) coinride 
with *2!) find wr !n>-4 flip Mpndidi.'Ui law, 

Thf fjH»«>tn'n cd ^^fn-thir th™* apfnally pxiftt hr-mlity la^’sa whirli aatiafy (28) 
with n < t, und h I mn nnly Ik* wilvptl pxfK'rinK'nlally. ITiwrctirally for- 
nndw givp th«" gmi'ml hanxiity J»vv of n clowd hirityjK? PoiisiRtinp; of 
thnrf* rliiwr», with tho mnditmn (hat two of tho thm* ctiwoH fx* putt* raePM. It 
)» eaay to f<*»* that thr« only iiiw of hontliiy in which all three cla8«*a are pure 
ra«of ix given hy the partirnlHr wdiilion of (8) 

(291 / “ «'« + d* -f t)! P “ (Jfa d jS Hh 7)1 'if' *-• 7(« + /J + 7)1 

in which Ft » Fn 0. 

6. SupfRwing !U* Iwdorr* that the heredity law is stable, it remainft to prove 
(lie following theorem to exhaust all poaeible biotyjiea eonsiafing of only three 

citmtv. 

TilKonKM: If aU elamejt arc Ihm 

(30) / iw pJof •i a F yf. Iff q(ix + tJ + t)*, ^ “ r(or -f jS + 7)*- 

If only one of Ihe rliuttnt nrpregmfs a purr racr, tlwn nthfr 

f tjj ({* 4 4 h)(a 4* iJ) 4“ (1 ~ d)7l 

(31) liO “ (tt + $M(l “ !•)(« 4 d) + dy) 
iff «■ y(a + )? 4" 7 ) 

or 

(32) / m etH 4- aa{(t0 + y) and av + ^ * 0- 

We have wen that if/, r, and ^ are functiona of (a 4-1? + 7 ), then we arrive 
at (30), in the contrary ease we arrive at (8). Here we distinguish two cases: 
1) Fi and Ft are irttKiucible qviadratic forius which are proixirtionaJ: Fi • kiF, 
Fi « kiF, and 2) Ft and F, have a cororaon factor, which is a linear form. Sup¬ 
pose at first that F is a quadratic form. If none of the nurnbera ki, ks, and 
Jfci + is »ero, then two of them may be taken as poHitiye, say h and kt . But 
then the coefficients of a* and fi* in F would have to vanish in order that ip have 
no negative eoefficiont*. But tltia case of two pure races has already been 
discussed, and leads to foraiulaa (28). W'e must tlmrefore suppose next that 
one of the numbers ki, kt, or kt + kt is »ero. Suppose that h -4 h *■ 0, that 
is, that the third class is a pure race, and hence the coefficient of 7 inp'is unity. 
Therefore, the coefficient of 7 * in F must be zero. We eon take A; *» 1, then 
fci a» — and therefore the coefficient 07 in F is negative, say ~d. We can 
now write 



60 


8. BERNSTEIN 


(33) F = aa + bctfi + c/3* — dccy + efiy. 

We have as before 

(34) F(«S + F,^S - F, yF) = 0, 
from which, we derive by Taylor's expansion 

(35) S ^ F'fi ~ F; 
or in other words 

«4'|9+7' = 5a + 2c^ + ey — 5/3 + dy — 2aa, 

which leads to 

(36) F = ^{b — l)^* + bad + h(,h + l)j3* — day -i- fl — d)dy 
and hence to / and <p, which are aa follows, 

(37) / = (« + /9)[K1 + b)(a + -f (I _ ri)y). 

¥> = (ot -f- ^)[Kl — b)(a + /?) + dy]. 


It now remains to suppose that F is a linear form. ]x«t 

(38) F » Xa ri* /id + 7. 

Here the condition that the heredity law ho .stable loads as before to the (‘({nation 

(39) S= (k + fcOF; - kF'^ - A,F; = (A + fc.) - \k ~ lik, , 
where k and ki are linear forma 


(40) 


fc — oa + bd + cy, ki = aia 4- bj/3 -t- r,iy. 


Hence if we had no restrictions on signs and magnitudes we could select k 
arbitrarUy, and then we would have An = [S -|- (\ _ 1)A!)/[1 -- m], and the 
solution for f, <p, ip would depend on five parameters, (X, fi, a, b, c). 

But since in / = aS + kF, the coefficients of |3y and y® are non-negative 
pb > 0, and b 4- ac > 0, c > 0, and similarly from the same proptmty of w we 
have Xoi > 0, ci > 0, oj 4- Xci > 0. But n and X cannot lx)th be non-iu*gative, 
for then X/ 4- /no 4- v* = 0 would be impossible. 

Let M < 0, then b = c = 0, but then the coefficient of a® in/ would be 1 4~ oX, 
which will be too big, unless X « 0, Henco, F « /id 4- y, fc » a«. and 

/ = «S 4~ aa{nd + y), 


Ip = -/ly = /.[S(d 4- y) - oa(/id 4- y)]/[a - 1]. 

Hence we have exhausted all possible cases and have proved our theorem, 



MATOEMAnCAL PROBLEM OF HEREDITY 


B1 


6. Wf> ran siirnnmriK' otir mmlts as follows. The heredity laws of a clwed 
biotypp rtf fliTtT which rtp Blable can Ik* divided into the following typea: 

1. Two elssiiwe* (t>pr*»ent ptirv rmm. The heredity lawn are given by (28), 
and m partinilar for fhe Mrndelian ea«e by (21). 

2. Their are no putt* raw, and every race ean lx‘ obtainetl by crwaing the 
other raw. The heredity law w given by {80). 

T All fhrw* ekm* w pure mm. The herwlity law ia fpwn by (29). Any 
two rlww of thin bintyfie, nlao form a clcwtHi biotype. 

4. One of the ehwfw ia a pure rare, llie heredity laws are given by (31) 
and (32). 



SOME RECENT ADVAJNCES IN MATHEMATICAL STATISTICS, I 

By Burton H. Camp* 

Wesleyan Unirtrsily 

The papers eonsidertid in this partial revie'w are listtHl at the end. Fur the 
most part they have appeared within the last five years, hut in order tu explain 
what has been done within the last five years it l»a« lx‘en neewsary fMTii*<ion&lly 
to use material that appeared earlier. The suhjert matter is divided inUt 
four parts. 


Part I. The Theory of Tests. Since an attempt is liein^ made to pnwmt 
the material of this paper in such a form that it may lx* read rapidly hy those, 
who have not read the underlying literature, the author will endeavor to tlo little 
more, in Part I, than to define and illustrate several terms wlileh are Is'iiig uwsl 
Altogether there are nine of these terms. It is fortunate that their meanings 
can be explained pretty well by reference to an extremely simple inetutT‘. r<et 
each of the curves in the figure indicate a probability distribution p(x; S), in 
which there is a single variate x and a single parameter ff. 


Example 1. p(x, 6) 


1 




the normal distrihution in which the 


VTr' 

center is at a: = fl, and the standard deviation is unity. 

Let a random sample E be drawn from a population indicated hy such a 
curve. In the simplest case E « as, a single individual. Shortly, we shall have 
to suppose that there are N individuals: E ^ Xi, ,Xu. Eventually, the 



picture will be generalized much further. The population will bo described by a 
function of n variables, so that, in place of each * of our sample, we shall have 

1 One of two papers read by Cecil C. Craig and by the author at a joint moetlng of the 
Institute of Mathematical Statistics, the Econometric Society and the American Statietioal 
Association, held in New York City on December 30, 1941. 

62 



MA*IHf' MATIf'AI, i^rATlMTIC.H 


03 


j’ , • ; timrtuvrr thf-r*' will U-, iiftt nut* panuiictt-r, but I pariuucUT.s 

0 ; “f* that otH” pnibability iuii will l)e' iiuillivuriiit*' and will 

1m‘ tlriiub'd by 

, o'''). 

A wttnnnm way <•( putting (hi*, in to h.ny that / and 0 art* vcrturH in n and I 
diniftiMnin, r<*‘«i»t‘rtivcJy, awl ftt havf the' hirni a*< miginall.v, pi/; 6). In thi* 
figun* tin* ‘•part* which the t^ainplcK >E ■ x) can tveenpy in of ftnitvi* not more 
than the /-si\ih. hut in tin* itutHl genera! caac the »<iintplc j^pacc will lw> a part or 
all of a Hpnee of n.V tlitncn-itinf. and will In* ih imlctl by 1!' Am Ls well under- 
KttKKl, n 'significance tc^«,t an ineiitiabfy which hpccilii"s in IP a certain rt'gion 
ir aa a critical n*gi(iri, and if K ic in tin** u\ the liyiaitheeis Iwing tet«tcil is rejected. 
Fur example, in the figttn*, one might test the hyindhcsis Hn that B Ot . The 
rejection region ic* might lit* the part of the /.axis wher** / > /o. In all micli 
caac?< we shall let a equal the prohaldlity that K Is in ic,i if (9 » Bn. Tliis state¬ 
ment will Ik* tlenoUfl sas follows: 

(1) « - ] fls). 

P atiinding fur proliahiUty. 

(t) Pnurr of a UM, A gowl U*j<t Khciuld satisfy two eondiHons: fa) if our 
ftample is drawn from the fKj|itilatinn s|K‘eifitHl by On , the hy|KitliesiK //„ that 
0 ^ On whould Ik* accepted *»« often aa p«»aib!t*, and (h) if our siunple in drawn 
from a iKipulalion »pi*ei{ied by aorne other value of 0, .•♦ay , then the h.VfKithesis 

tliat 0 6i should also Ik* aeeepterl aa often a« jKM'Kihh*. HupjKiw (imt that 
there are but thew* two admWible populations, The proVmbilily «f (a) is 1 — 
We commonly make the artifietal requirement that Uuh Hliall Ik* some larger 
fraction mieh aa 0.99. The probability of (fc) w commonly denoted by ji, and 
in the figitrt*, when u» « tea, 0 is the area under the 0i curve which lies to the 
right of X ® X* . Ilelative to fti, i9i, and a, the quantity li is railed tlie power 
of that test which deaignates «% a« the critical region. Also, a and (1 ~ 0) 
are the probabilities of the so-called errors of the first and second kinds, 
Kifipeetivciy. 

(it) t'nbimed kM. As stated, we would like to have 0 larfp:n In any case 
we would like to have ^ a. If fS g; a, the teat and the corrtvfxmding region 
tea art* ‘'imhiasetl” (relative to the prcaasignMl quantitica 0«, Oi, and «). The 
region w# app(*ar» to be vmbia«t*d in our figure. This definition eau obviously 
l>e exbendeti to the ease whore, in adfiition to 0i , there i« an infinity of adraimihle 
values of 0; then the tot is unbiased relative to the whole family of admi«ible 
values of 0 if, for every one of these ffa, d «• 

(m*) UMP tent awl GBC region. If, with respect to a family of admWble 
fl’a, a critical region wt exiatasuch that, for each of thw© 0's Oa), 0 is greater 
than it would be for any other critical re^on satisfying (1), then this Wa is said 
to be the common best critical (CBC) repon and corresponding test is the 
uniformly most powerful (UMP) test. 



64 


HVRTON K. 


(w) UMPU lest arul rajum. if fhrri* n<t (’H{‘ rr»t<Ji, ‘•till it may 

happen that, if (me res^trirtH tine's view to only nnlittw'd wpcjon't. t}ivtr may !»p 
among them a C'BC n'gion. Surli a regimi is ssuti to l»o a romnv'ii Iw-Mf rriBefil 
unbiawd (CBC’U) region, and tin* rorre^iKiruiing frst l^ tin* nrnSfomdii niott 
powerful unbia‘s(Hl (UMPU) tent. 

In the following examples, ami I'Wwhere, iww n**' //? to iruiieaie the 

hypothesis being tested, H* t<i indicate all «ihm»k»ih!c alternnlnT'* 

Example 2: p(x, 0) nonnni as in Example 1, E - r, If.-, 9 . //*, S > <?-,. 

The CBC region ia where x > & if 



p{x; flj) dx 


a. 


T1u 8 region ia the interval indicated by in the ftgim'. 

Example 3: Same as the preceding example except that now ae have m 
H*: 6 7^ Bn. There is no CBC region, but the CBCE negioii ron>.i»is of ty.«) iaif 
jntervals, where lx] > fc if 


I p(x, 60 ) dr « Itt. 
Jk 


A little reflection will convince the reader that the statement^ m theee two 
examples are at least apparently true. It i« gwiinetrirally evident, for example, 
that the last mentioned region (two tail intervals) is not as |Kjvi.erfiil wjfli resrOTt 
to the alternatives of Example 1 (t> > 6#) aa is the single tail region u*s in the 
figure. 

(y) Type A regions. It ia often difficult to find even a CBC'l' n'gmn. or such 
a region may not exist, but it may be that tliere ia a region which ha>i the retjuired 
properties if one admits only values of fi near to the value 4 iieang Irvted. Tyi» 
A regions have this property. More precisely, they have the firtijierty that 
the power of wo is a minimum at 9o with respect to small ehangw in S, and that 
this is a sharper minimum at fio than ia the power of any other which satWies 
^nation (1). Here the words “small changes" arc usetl as in the calculus. 
The full definition [4] of an unbiased region of type A ia that it shall satisfy (1) 
and also the following conditions: 

(2) d shall be a single parameter (not a vector), 

(3) ^ ^( 11 ) 01 fl) = 0 if 0 a= 9a, 

^ I * e# for all regions ut which satisfy 


the preceffing conditions imposed on ta.. There are also other tyfies of regions 

e3? p "t t Type A [9]. The follo^ng 

example illustrates Type A; it is a familiar problem with an unfamiliar aolution 


Example 4. p(x; <r) 


<rV^® . -E - Xl, 


, Xm ; //ft: or 


(Tft; 



V■’iTlWH.KTlVAU STATXHTtra 


65 


H* « 4, Th^. n^rr rr^jun r,r tyi«’ a iK rk'tprminpcl by two tail areaxi 

it»!t ll'sfy »rr Rf*! M^sml **1 rif fhc iliftribution of Sx]. 

ivs'i r«t wnfrtcMT^tft 5!J| Jcji) A^ympMicaUu MP tut [IB], (wit) 

3fpf' k»i «I51, In tW- fam^ tho romplptc defmitionB are too 
b’rtlttby Pf }m' fi'iK-ali’Tl bi"r»\ and tfw'V rannot !«• mrapitulated briefly. The 
prRfral id**®, is that, d nonf rd tbr ofRiorm of the pr(*r«iinK lypcm cjoat, atill it 
imy W lr»M' fb»t ihw* mp rn’iti'ma wbxrh flo have approsiraalely the desired 
pror!irrt» il K >- xt, ■ ■ , jc«r, iwmS .V in iarRe. The following example [11] 
illuatfftk'iS 

I 1 

Rmmpie «. pix; #) « ^ ^ ^ “ a, ••• .ij/i ffo; « 0; 

//•; ff ?«# 0. R^on» wf Type A noVdafcKl in lh(* limit are defined by the iti- 
eqimlityi, 



t 

(I + xlf 


2i: 


1 

I + x! 


d" *t 




5N 

3 


N 

2 


Hon* M iff a rjnantify that him U> !>** approxinmUtl and tabulated. The in^ 
etpudity in not eitnplo, but U fomWhw a defimk nn.wrr to the* problem. 

iix) HrffwriM «'ffljfor Ui M»mpk irpooe. All the pre*M*ding definilioiw apply to 
liie cfw* whi-rr / is a vrrtijr in n ejwo, hut not all kx the ease where tl is a vector 
in f apaep, HupjKw now that thi« i« the rime, or, an we have said l«*fore, that 
there are I different poratneteni f' ■ • * , 9 *, each twing eapablc of taking on a 
variety of vahjesi. fiupfMwf. w<* fix nur iitk’ntion on 0*'* and wish to teat the 
hypothesis that «’*' «' Firnl of sit we wish to find a critical region tco for 
which an eipiatirm like (1) will lie tnw*, independently of what the values of the 
other iwwrniftorw may hr*, rinch a nHi^nn i« «id to be '‘similar" to sample 
apace; the "similarity" ctmawto in Uie fact that the equation like (1) would be 
tnu* independently of the other parametera, it Wn were replaced by all of sample 
space and if « *» 1. Felkr (lOj baa shown that there are aimple cases in 
which there is no region (similar to aomple space. He and others have investi¬ 
gated the conditioriB under which <mch regiona do exist. "Generally speaking 
it seems that for most of the probability laws p(*, d‘*\ • - •, in wWch the 
comp<»ite probability law for sample apace is niade up by multiplication, 


( 2 ) 






there do exiat »uoh similar ref^ona, at least if IV > I" 


Part n. Kstlmatloo. (t) EstimtUim by intmaL So far m have been con¬ 
sidering po«ible answers to the quwtion: Shall specified valu^ of - * *, 
be accepted? The totality of vtduee of the 0*8 which are so acceptable might 
be call^ the acceptable point set in parameter (f-dimensional) space. This 
point set is determined by the sample or experiment (E), and usually different 
point sets are determined by different E’a, Frequently this set of points consti- 



66 


BURTON H. CAMP 


tutes a eimple closed region, or, in the oubp of only one parameter, it may Iw a 
single interval. Such an interval is eallwl a fifturial or rrmfidrare interval. 
The fundamental property of euch a point set or inteival ifi well kmnwn, hut ha« 
to be stated with some care: If a =» 0.01, and if one is alniut to t*kn a wmple 
from a population in which the true valu»w of the imramfterM # *', • •, arc 

‘ , ol'\ then the probability is 0.99 that the samplo will Im «rh that the 

point set determined by it will contain this true panametor point §»”, • • -, 

It does not matter whether or not one knows what thtw tnip vahiw of the 
parameters are. If there is more than one pararneUr, the* fitlucia! intorval for 
one of these parameters often does not oxistj that is, there is tiften no such in¬ 
terval which is independent of the values of the, other parameter, Thp quifttion 
whether there is such an interval is obviously connected with the fpimtion 
whether there are regions similar to sample spare. But if one fiducial inletwal 
does exist, then usually there are an infinity of them, and our problem ia to 
choose the best one. This problem is cellcsl ''estimation by inlw’sl." One 
answer is to choose the shortest interval. Mom prwwely one should say, the 
shortest system of intervals. One gets a system of interval by fixing « but 
not B. What is draired is a formula which will give the shortest interval for 
every E, but it may well happen that one formula (system) will supply the 
shortest intervals for some ^'’s, and another will supply Uu* ahortost intorvala 
for other B’e. The choice between the two systems will then dcfwmd on the 
relative frequency with which the ehortwt intervale will be supplied by one 
system or by the other. 

Example 6: p(®; (, a) is normal, { indicating the mean and v the standard 
deviation. Given E = xi, • • • , ; to estimate The shortest of 

confidence intervals does not exist (independently of <r). 

Example 7. Same as Example 6, except that now one seeks only an upper 
limit to the confidence interval which the parameter must not exceed. Then 
the shortest system (best one-sided estimate) is: f g « + to, where Usher's I 
and 8 are meant; t corresponds to a .preassigned a, and i is the mean of the 
sample. 

In cases like Example 6, where the shortest system does not exist, Neyman 
[7] defines a "short unbiased system." 

Example 8. The short unbiased system for Example 6 ia; ~ to S | ^ 
a 4- to, {t, a, £) as in Example 7, 

(if) Single estimators. Suppose that, as before, we have a sample (B) and 
wsh to choose the Iwst single value for one of the parametora, not as before its 
best liducial interval. It is well known that there often existe a fidudal func¬ 
tion g(9) which, like a probability function, is everywhere positive or aero and 
has an integral, 


9(9) de 


1 . 


and is further useful in determining confidence intervals. In particular, if # Is a 
location parameter and if the composite probability function is as in (2), with 



U.KTiltMATlCAl, STATIBTIC8 


67 


only om* panuiH'l^^r S: gi$l kpfxt - 6) ■■ ■ p(z^ - g), k being a confstant. 
An e«f>mate rommonly thought e*f a« best n the maximxim likelihood efitimate; 
thin iff file rnKwie of t hher eRtimat-es that have intereeting properties are 

the mean and the median of Pitman (14] defines a new “liest" esti¬ 

mate 6),. Thn has the profierty that, for every h > Cl, is within h of the 
fnie vahm d more oft^'O than Is any other estimate. More preciKely, if 

i**: ~ fl . S h) a P(| <ti “ 9 i ^ 

for &I! values of h, and if the inequality sign iK'tween the P’a holds for 

some }«tftiiive %'riJtfe of h, §i Isdog everj* other estimate, then (?« is the "best" 
eatitnatr. As Isdon* /' stands for probability. 

Example l». If p|x; a} is normal and the sample. E «= xi, • ■ • , the 

v 7"^ V5! 

ORtimate of a i« ‘ , instead of the usual estimates: ~ . 

A — I H — I N 

(Jill Wright funrtum. Wald (13] defines a weight function E(fl, 6g) which 
deiKuds on the wriousness of the ennr committed when the estimate 6a is used 
in pilacf* of the tntr value f>f the parameter 6. The sample, E = xt , ■ ■ ■ , Za ; 
and 6 may l«‘ a vvrtor. Thence he defines a risk function, 

r(6) a, f V-pixt, ^ ^, Xal6)dW, 
and the 'diewt" 9a that value of 6 which minimiiw'fl the total risk, 

/ VpdJiO), 

tills integral binng taken over all of the parameter space, and f(0) being the 
a priori distribution of 9. It is undesirable to introduce /(S), but it can be 
shown tlial, subjeet to slight restrictions on the nature of /, one can obtain a 
beat estimate by finding a value (?* which for all 5 ’b makes r equal to a constant 
and also satisfU’s other general conditions; this equation and these conditions 
do not contain /(9). In a symmetrical but otheru'iae fairly general case da is 
the maximum likelihood solution. 

Part IIL Likelihood Testa. This part has to do mostly with special cases 
of likelihood As is well known, this lest consists in selecting a critical 

rejeelion region m in RampU* space where 

(a) P(w 1 //») « a, 

(b) the relative likelihood of Ih i« small; more precisely, where X < constant, 
and 

ma^ .P(.K | w) 

^ “* maxQ P(i? I fi) ’ 

u being the region in parameter space specified by the hypothesis tested Ho, 
and ft being the region in parameter apace specified by all admiasible hypotheses. 
(In special eases max is replaced by least upper bound.) If Ho is simple (w being 
a point) and if the CBC region w exists, then w is bounded by the contour. 



68 


BimTOH K. ('AHV 


\ =s constant [19]. Othfrwiw this X It'f'l fi«*t tvvvM 
critical regions as do any of the pnretling Se^!e, Hot j! sf. ntnei* 

easier to apply, and, in many of the eaw that folhm, tie -f- X t* -ir# e-xel 
as judged by the preceding thetiry. They an* jwiwerfui (!.»’« if “l.t m*- r»*<t tin* 
most powerful of all tests, and often this {>m«T can Iw fusunl ,-«n4 ♦.tbniated. 
In fact Wilks (28] has shown that the appropriate tiiofriHution .if \ -omittinK 
terms of order l/ViV) can be found* if the dintnlaitiun of K a* 

n P(i<, * • *, 8'"), f -V Inr^^e | 

and’ if the optimum estimates • • • , exist and are diMtnbn<i»'ii u x^-p! for 
certain terms of order l/V^i) normally. This thfs»r»'m hn.^ now }«r .-ji grmeral- 
ized by W'ald, in a paper presented to the Amennni Maflnmatv,'*! .SM-V'tY in 
December, 1941. 

There are many of these tests, made to fit all sorts of hyiviilv'-i'** Tlo' author 
will try to summarize a cotisiderahle group of them; all mf'mlw'rh of tSox group 
might be called generalizationa of the Htiitlent-l''ishr"r Thry fsH miSundly 

into two classes, according as to whether the imlivitiiial** of iho xamph- mr tnketr 
from a univariate or from a multivariate universe, t'nh-s'* oJhcrwi**.' st.iSrd all 
universes shall be normal, ffo shall stand for the hyjsiihr'**!- Ix ing and 

H* for all admissible alternatives to //#. 

(i) Univariate case. The sample consists of N elements, ne Is-fore. X). * • , Xv , 
chosen independently fromiV normal [wpulations indieaied by lienr paramet-ers 
(ti) eO) '••>({«! ew). About these txipulations we may ask a vimot\ fd tjm'S" 
tions resulting in a variety of problems and tests. 

Problem a; If the populations are all identical »(, th«'« { « xj>»-eitied 

in advance)? This results in the well-known West. The hyi«»lheMi» 
is that { = fo, and tlie alternative hypothesis //* is that | ; it Iwing rwumed 

at the outset that all the populations arc identical. 'Hie f-test han» Itwn ffhowm 
to be an DMPU test relative to H*. 

Problems h,c,d: Let these same samples be arrangini in k gnmjw or "column**” 


where the n, are not necessarily all equal, T,et it be assumed that the jwipula- 
tions (t, <r) do not change within the columns. Problems (f>), tc} and id), with 
their corresponding tests, may be indicated as follows: 

(b) Are ({, cr) con stant from column to column? (The X^ » L test.) 

• except for terma of order l/y/M. 

DD ° Tranaaotions of AmeTioati Mathematical Society, voi. 36 (ia34), 



ATH KMATIC-AL STATiSTICH 


69 


,f'i li. (T roiixiaiit from *!ctliunn to rolumiv rt'KardlesH of what values the f's 

ifiay have? 6Tie* Xf?, -- 

i>!} I*, t (im^fanS from nthmui to eoluniii aHsuming the a’s constant? (The 

/,'3 

In I'rnltlem *hi, //-, js that, g) are constant, //* that they are not constant- 
Iti I’roideiii //„ in that ff is constant. //* that it is not constant. In Problem 
fVIl, //-- 1 ** lli!U £ i« conHlant, //* that it is not constant. The test of Problem (c) 
bits recently b-en '•hrtwn to Ik* unhiasetl only if the nnmlH-ra in all the columns 
fiin* tin* *-300* 'oi . ^ m). It i.s, however, nnhiased in the limit. Power 

ta!*l(s were jtnblislusl in ITI" [23}. Bartlett’s (11)37) n is another tc*st for this 
pntbh’rn, ami Pitman'.s |3tij L tc^sl is another, but it fias l)een .shown that these 
two are erjuivalf*nl. Ihtih are unhiaw'd; thej' are, not likelihood tests. 
This prtdilein is fretinenlly called the problem of the "homogeneity" of a.set of 
varnin'S's. 

All these Sesf-s an*, of cuntiH*, funetion.s of the ol)servat.ic>n.H, and the details 
are readily available in the pajjers For example, Pitman’s 

AV2 '"C/ 2 )’ 

when* i« wliat he ealls the "siiuarumee" for the itli eolurnn, and a large value 
tif h is cigniliraiit. 'rhe t»(iuarianre is what the physicisUs Inul called and what 
Btatifslicians ought therefore to have called the. second moment, viz.: N/a ; gj is 
really the unit wcond moment, 

(ej J,mar HiffHiifims. Problems like the almve, and many otliers, c4n be 
included in a general llieorem by Kolodziejczyk, who showed how to write out 
quite simidy the likeliluMKl test if each { is a linear function of I parameters 
{1 < iV) anti if the hyiatthesls //e specifies the values of r different linear functions 
of the fs (r jS 0* Furthemiore, the iKiwer of thi.s test (with numerous applica¬ 
tions) was dimiftsed and tabulated by Tang in an important paper (39]. 

i’roblem (/). This method (e) has been used by Neyman' [43] to tost the 
homogtrneity of a set of variances, the problem already studied by a number of 
authora. It has Iwn stated that some of their te.st8 were unbiased with resi>ect 
to the alternative hypothesis that the tr’s were not all equal. Neyman gives 
reasema for supposing, in the industrial problem he is considering, that it would 
b<» more realistic to consider another alternative hypothesis, namely, H* that 
the ff'a are not all equal and tltat their distribution can be approximately 
descril>ed by saying that l/o-* has a x* distribution. No BMP test exists but 
then* dm* exist a critical region whose power, with respect to a sub-family of H* 
is indeiJcndent of tlie means, and the corresponding test is the most powerful 
teat for this sub-family of alternatives. Tables of its power are furnislied. 
More api)lication8 are promised. 

(«) MitUwanate case. The sample eonsiats of N elements, exactly as before, 
except that now each a; is a vector in n space and comes from a multivariate 



70 BURTON H. CAMP 

normal universe whose means may be repreaenteil siKaiii hy f if we think of f 
aa being a vector in n space. The other inirameter* of thin universe an‘ the 
variances and covariances a„ . Bo, with these fhanRce, we may n'}»»*at Ihi* 
statement at the beginning of (t) that the sample is Xi. ■ • ■ . x^ , and that tJie 
populations are (fi, a.,i), • The questions to enked alsmt 

these populations corresjxmd exactly U» thow askrsl in the sirnjiler ease. 
Problem (o): If the populations are all identical (f, **,)) dws i •» It fsjiecifKHl 
in advance)? The answer is given by Hok-iling's T test. The hyjtothf^w 
tested is 27o that the vector ^ , and Urn alternative hyiadheus //* is that 

these two vectors are not identical. P. H«u [28) has shown that this test is tJu* 
most powerful in a special sense, and has given a new demonstration of il by the 
use of the Laplace transform. Incidentally he has shown that tht* Laplace 
transform of an elementary probability law detmnines the law’ uniquely except 
perhaps at a null set of points. 

Problems (b), (c), (d): Now let the same sample lx>, arranged in k grotqw or 
columns, aa in (i) b, c, d; and let it be asaqmcd that the ixipulattons (if, «„) 
do not change ivithiu the columns. Problems [h), (c), and (d), with their eortt'- 
sponding tests, may be indicated as follows: 

(b) Are ({, at,) constant from column to column? (The Xw(,( t«v*lK 

(c) Are «(j constant from column to column regardliw of whatvaUiw the J'a 
may have? (The Xb(„.) test). 

(d) Is the vector f constant from column to column awuming the «„ constant 
from column to column? (Tlie Xh test). 

Unfortunately, in the customary notation, the X’s for tliia case (tf) cit> not follow 
the pattern adopted in (i). It would be bettor to put (n) after each of the X'« 
(or L’s) in (f) to signify the corresponding tests in (ft). But, even if this wert* 
agreed upon, there would still be a confused notation because then* art* many 
other “X” and "L” tests besides those listed here. Apparently^ the jKiwer func¬ 
tions of these last three multivariate tests have not been found yet. 

(e) The linear hypothesis theory was shown to be applicable to the multi¬ 
variate case in a special instance by P. Hsu in 194Q [38], Since then he has 
generalized it further [46]. 

(rff) Bivariate case. This important special case of (tt) haa now been pretty 
thoroughly solved. A general sununary of various teats which have b«Hfn de¬ 
vised by Finney, Pitman, Morgan, Wilks, and E. S. Pearson was given by C. 
Hsu in 1940 [42], with some slight additions and with tables of power funeliona 
with respect to certain alternatives. Altogether there are seven of tlK^ae test® 
corresponding to seven different problems, including the four just referred to as 
Problems a, b, c, and d, 

IV. The Method of Randomization. This part concerns randomisation 
of the individuals within a sample to obtain a method of testing hypotheses 
without making use of any characteristic of the population from which the 
sample was drawn . It does not deal with randomization in field experi- 

< So far as the author is aware; but he does not pretend to have made a careful search. 



.MATHKMATtCAIy 8TATI6TICT 


71 


mpnte tit off-wt fhp tif variable fertility. AI«t>, in this diBCussion, the 

hyjKithfj-w k'it»K is nnt tiial the sample was a random sample. It is 

Mminml that the jciven sample i« random. We begin with an example from 
Pitman Two sampl™, (xi , * - • , xm) and (yi, • ■ ■ , have teen drawn 

at ramlom from two iwtjnjIationK. 'Dm means of the samples are i and f, 
tnhrw^rtively. tet , I* — g! k* calUsl the spread of theae samples. Now re- 
arrange thm* jnume x's and y's with each other in all possible ways to obtain all 
possible sprearls. The largi’r the oliwrverl spread, among ail these jjosHible, 
spreads, the mone significant it is suppomi to te as a tost of the (null) hypothesis 
that the two populations were identical. Similarly, testa have been devised for 
correlations, varianres, etc. 

E. H. Peamm fSI] in 11138 published a criticism of this general theoty which 
in sutwtance se(<nw to te that the reason why one calls the largest spreads aig- 
nificant, rather than tlie smallest ones, in the, illustration just used, is that one 
is assuming tacitly that the admissihie papulations are such that large spreads 
.would te more likely on some other than the null hypothesis; that if one does 
not make some such implicit assumption, then one might quite as well call the 
smallisst spreads significant; and that therefore, barring such implicit assump- 
tion,s, one can control only errors of the first kind by this method. 

It seems to the author that Pearson'a criticism is sound, and that, if indeed 
one is unwilling to make any assumption whatever ateut the populations con- 
siderf*cl, then this device is of no^ value in testing the null hyiwthcsis. For, if 
all Umt one pretends to do is tti control errors of the first kind, one can do that 
by consulting a table of random numbers of two digits. Thus one can control 
errors of the firal kind witliout performing the experiment at all, let alone 
making the long computations usually required by the method of randomiza¬ 
tion, Or, belter, one can reduce that error to zero simply by making up one’s 
mind that one will never reject the hypothesis being tested: certainly one vrill 
never itdert it improperly if one never rejects it at all. 

However, if one is willing to make in the illustration used the very mild 
assumption that Uio populations considered are such that unusually large 
Bpreacte would more probably be obtmned from some admissible hypothesis 
otlier than the null hypothesis, then it would seem to the author that the method 
would te useful. Similar remarks apply to the tests for correlations, vari¬ 
ances, etc. 

REFKllENGES 

I*A»T 1. TaaoKY or Tkbts and Part II. Estimatjom 

(1) J. NarwAN ftRd E. S. Pjubson, "On the problem of the roost effleient test* of eUtistioiU 
hypotbewro," Phil. Tram. H.8ae., A., Vol. 231 (1033), pp, 289-337, 

(21 J. Nbtman Md E, 8. P«a»son, "The teating of Btatiatios,! hypotheeoa In relation to 
probftbilitloa a priori," Camb. PkU. Soe. Proe., Vol. 29 (1983), pp. 492-610, 

(81 8. 8. ■Wijuas, "Teat criteria for eUtlaUoal hypolheaee involving several variables," 
Jwr, Aw. Slat. A»»o., Vol. 30 (1936), pp, 649-660. 

(41 J. Netmak and E. S. PSARaoN, "Contributions to the theory of testing statistionl 

‘ Pearson’s language ts not so strong as this. He aaya "perhaps it should be described 
aa a valuable device rather than a fundamental principle," 



72 


Ht UroN II, 


hypolIiPHi'fi I. I’ulii.w il prHii’ni r(j;i*(n«i *i( Jjj'p A ’u. i i .'‘'■-j? H>~ 

ll/m , tJiiiv of Iidiuliiri, Viil 1 i I".It'ij. pp 1 '!7 
(51 J Xeyma.V nni] K >S "/‘.iHh n'lil rS-i'i-'Iih >■]■( tJ.-h s,/ i' j. I'Airl.jj 

tests of HdltistlP!!! hypOttO'M'f*.'' .‘'V'jr Urf i ;•;) H-J ! i7 

(0| E J. a I'itma.v,‘'T h(‘cloBt'st ofjnr.tro* ti fi, f I’l’i ? S'<- 

Pr«c,, Vol r« (I!).'!?), pp '/!•/‘■nr-* 

(71 J. Xkyman, "Outline of « thforv of I'uiun.Pion hripi 4 on ifi* * A ofi of jir-Joj 
bility," 1‘hil. I'rtiiii. Huy Stir , A. \'oI itP) P'.tf ’l, pp iH i**? 

(S| J Nkuia.V, "On sliltiHtirH the'lifitriloilioK of (A jiirli (>■''iii'li jk lol"',''-.i «?< i p ir .)i*<'i rf* 
involved in the [trohnhilil.v l.-iu *tf ilio ojinmnl I* " /o * ,\o) J 

(IMS), pp. .W ,1!) 

[9| J, NeymaN' and K. H. PtvAiiwi.S', ‘‘(‘oiitriltii)ion« jo Oo" Oeorv of 
hypotlieses," «SVu( Hrs .1/rm , Vol, 2 pp 2,7 .i" 

[10] W. Ff:i.tiEii, "Xnle on renioiiH xiniilar to !lio eanifiii (tp'in ," Shii /«• ■ Wio/ , Vol 2 
(1935). pp 117-12,7. 

[Ill /• Xkvaian, "'IVkIh of fifttlinlienl llypo^l)l'f^^•^ wlnidi sin unbriH d in "ie Jon,!' " J 
uf Mnlh fjUil , \'oI. 9 O'JHS). Ii!l Ml 

[12] S, S. WiliK.s, "Fidueiid dislriliiilioim iti fiiliiei;il infi Sftt! , 

Vol. 0 (193,S), pp 272 2SP. 

[131 A. WaIiD, "('nntriliutioiiH (<i (lie tlienry of eatinmlion and teMiHii liApoOe „ v 
ofMalh .S'fot,, Vol, 10 tl!l3!l), pp 2;Kt ;i2it 

[141 E. J. (.1 PiTMAiV'i'"I'lie eHtimation of tin* liie.ilion and ar.ili-pAi.ote 1« J' 4 11 'ii'o.uom 
population of any given form," Himtulrikit. \'ol 30 pp 'S'l) J2I 

[IS] A, Wald, ''.'Vsymptoticnlly imint powerful tenia of e(iili'*»n .d InpoOe o,f 

Malk. Hlal , Vol 12 (Hill), pp 1 19 

I’aKT III LiKtOilliooo'I’t.mn Nen-iM, < 

[161 II. IIOTKLUNU, "Cieiieralizeii l-tesl," .-Immlso/.Uutl. Xfot . Vol 2 .l‘»;tli pp .t/n 
[171 S, S. WiLKH, ''Certain ReneraliiuitionH in nimlv«jH of variatire.'* A.<i, Vol 2} 

(1932), pp. -i/v/gi. 

[181 E. S, I’eaksun and 8 K WiLaa, "Metlioila of ninliKiiral nn.dymi .ipprojin )'«* for A' 
samples of two variates," lUoiiiHrikii, Vol. 2.7 11933), pp ItVl :t7R 
[19] J Neyman and E. S, Phaiison, "On the problem of the moat ellirn n! !< Ms of nt.iiiKtieal 
hypcjtlioaes,” F/id.'/Van,? /foy. .SV.. A,, Vol, 231 0933). pp and :t.t7 
, [20] B, L Wklcii, "Some problems in analysis of regrension ariiong k siiitipjen of iim van 
ables," Biomelrika, V(d. 27 (1935)’, pp. 145 UK). 

[21] S, S Wilks, "Test criteria for statistical liyptilhesi's involving several vnri.ihli »,'* 

Jour. Am iStaf. Asso,, Vol. 3(1 (1935), pp 5-19 500. 

[22] P, P N. Nayeu, “An investigation into tlie applientiori of Xeymnn and Pearson's /,, 

test, with tables of pereenlage limits," Hlal. Kes. Man . Vol- 1 tlU3t>), 1 niversilv 
College, London, pp 38-56. 

[23] S. S, Wilks and C. M. Thomi'kon, "'riie snmplmg distrilmUon of (he entenon /f when 

the hypothesis tested is not true,” Hwmelnku, Vol. 29 (1937), pji 121 112 

[24] D, J. Finnky, "Tlio distribution of the ratio of enlirnateH of the two vnnaneeH in a 

sample from a normal liivariuUi populatUm," /finmrfnfea, vol, 3f» A9-W), pp, 
190’■192. 

[251 D. N. Lawlby, "A generalization of Fisher's a-lest," /fiometrilfn, Vol. m (I*J3h), pp. 
180’-187. 

[26] P, L IIsu, "Contribution to the theory of 'Student's' Mest as applied to the iirnldem 

of two samples," Slal. Res. H/em., Vol. 2 (1938), pp. 1-24. 

[27) P. L, Hsu, "On tlio best unbiased quadratic estimate of the varianee," Rial. /to. Mm.. 

Vol 2 (1938), pp 91-104. 



MAlUf-.M Vrif'AI, STATISTIOH 


73 


j2v>. P I Hi. <«n n<i!* Hiiig'n Ri'niTalhcil T," Annak of Math. Sta(., Vol. 9 {193S), 

j.p LUi 2ii 

I'JfP K H)ih 1 , "'ii.i' l.jrstr s.nm|ilf liifliriliutiini nf tlic likclilioml ratio for trstiiig com- 
}v * 1 )'-^' jA|»'«Jhi 'iim.” Annoh of Mnlh Slnl , Vol 9 (I93H), pp. (K) 03. 

piT t 'A Bt!4 J XliMtS', "Lutrrisifm of Hu* Markoff (hcorcm im Irast Briuarea,” 

•M.f H>'-> .Mrm , Vol 'i 'ltavt.pi, Ki.'Hll). 

131' !> -f Xiintf on wrlniiunellioilsof tratitiR for tliR hornogctioity 

of s of vananwn,” Jour. liny. Sial. Hue,, fiupp , Vol. 0 (1939), 

pj, A't vt 

i3'i; (i W "(tn »hr {khmt of ili(‘ Lt last for tlir ctiunlity of several variaiicps," 

.Vtof , Vol K)fl!W9).rip. 119 I2X 

ptj 1" J fl PiTMAK, “A (lota on riomml ciirn'laliiiii,*’/liomefrito, Vol. 31 (1939), pp, 9-12 

1.313 W A MxHo "A last for U«* signiripancf of the ililTiircnce bptween tlm two variances 
ni a saiofilr from a normal hivanato population," liiomelrika, Vol. 31 (1939). 
f.p PI 

ia.i; It J Itjfttor. ‘"tin a romprrhrnwvr lest fur tlm homogfincity of variances and co- 
Var(4«rrS in tnnUlVltriale prtililems," Hwmrlrika, Vol. 31 (1939), pp. 31-55 
K .1 '1 Pii.M\r*‘. "Ti-sts of hyimtliesea coneerninK location and scale iinraraeters,” 
'rMo. Vol 31 (1U3U), PI*. %%.) 215. 

(373 ** .f.tlissoA and J N'l.tw tN, "Testa of eerlain linear hypotheses and their applica¬ 

tion *.( iKiiite r‘d<irational prohleins," iSVat. Km. .1/em., Vol. 1 (1939), pp. 57-93 
P I, Hni . "tin ReneraliJt’d annlysis of variance, I," liiomelrika, Vol. 31 (1940), pp. 
i:i 93? 

1,19) P (* rvN«<. “ Tie' jemiT funrtion of Btinlyais of variance tests with tables and illustra- 
linns of their iifne," Alnl lire Mrm , Vol 2 (193H), pp. 120-149. 

Htp 11 O If tHTM.v, " PestiMR the hornoRotieity of a set of variunceB," liiomelrika, Vol, 31 
!'P‘litK pp 219 25.5 

141! J I list T. "Hn the unhituvHl character of likelihood ratio U'ata for independence in 
nornial systems." .Irinnf* of Math. Alul., Vol. 11 (1940), pp. 1-32. 

142| C" T IImi . "On sNfniilr-a from a iiorfiial Invarmte population,” Anmls of Math Klal., 
Vol n ’P.H(I), pp. 419 420 

H31 J. %'»"A stalieliral profdem ariaiiiK in routine analysis anti in anmpliiiK inspcc- 
iioim of mn«i [irtaiuction," Annnf* of Maih. Ktal., V’ol. 12 (1941), pp. 46-70. 

1441 A. \Vai 3» wid H J, Hkooknbh. "On the distrihution of Wilks' statistic for testinR the 
indt"}K"rKk*nec of ac'Veral groups of variates," Annals of Math Siat., Vol. 12 
*11941), jip. 137 152. 

145! P. Hftf. "('nnoiiical retiuclion of the Keneral regression problem," Annals of 
Eiiffimes, Vol. 11 (.1941), ii(>. 42 40. 


f*s«T IV. Ua.nudmjzation Tksts 

(40! K. J, O Pitmak. "Higniheaiice testa which may he applied In samples from any popula¬ 
tion, (I)," Jour. lioif. Slot Sot. Hupp., Vol. 4 (1937), pp. 119*130. 

(471 K 3 (!. Pitman, "Hignifienitce tesla which may lie aiiplied lo samples from any popula¬ 
tion. ft I), The correlation cm'ITicieiit test," Jour, Hoy. Utat. Boc- Aupp , Vol. 4 


fl937). pp. 23'2. 

|4K) K. J Cl. Pitman, "HiKniliciUtcc testa which may he upfilied to samples from any POPUm- 
lion. (Of), Tim wmlysia of variance test." Hiamelrika, Vol. 29 (19.33), pp. 322-335. 
(49) B. L. Wki/ti, "On the s-iest in randomised blocks and Latin squares," Ihometnka, 


Vol. 29 (1937), (ip. 20-62. ,so isb 

(601 B. L. Wblck, "On tests for homogeneity," mornetrika, Vol. 30 (1938), PP. 149 iw. 

(61) K. S. Pkarbon, "Some, aapects of the problem of randomisation," Biomelnka, Vol. 29 

(1038). pp. 63-64. . , ,, , . 

Note; Xono of the 1041 Biomelrika was received until after this paper had been read and 

prepared for publication. 



RECENT ADVANCES IN MATHEMATICAL STATISTICS, 11' 

Bt CeCIJ. C. CftAKt 
lJniv(TsUif nf Mkkignn 


The statistical theory of the linear relatioiwhip Iwlwt'cn a »lej* ji-h-ts? vannhl** 
xi, and a set of independent variabU*ft ij, jc* , ■ - - , XrM . i** hy umu jt'-n* 
erally understood. Supposing that the j.V are nieoitUred from rrajef-einf' 
means, we determine the crkofTicienta, hj, hj, , in wmli ji **,3) m-> Ut 

maximize the coefficient of correlation ri j* * ,+t belttwm xi artd ■ Thn' 

coefficient of correlation, usually called the multiple pnrrelatmn 
measures the exactness of the linear relationship that esi'*t.'*. ari«i B l«ii> sSie 
property of being quite unchanged if the origins or the M-iile-* for tie* M-parale 
x,'a are changed in any way or even if the set Xj, x*, • • ■ , x,,t shnnld !#' rrphiri-d 
by any equivalent set of linear combinatioivs of them. 'Hiat i-i, <* g , |{ f i 3. 
the new variables, t)i == ii + Xi + Xj, e, » 2xi “■ j* ^ Hxs. r, X{ » 2^s ■ 2x» 
are equivalent to % , x,, x,, since the latter can la* found if the s ar** kiitoin, 
and the multiple correlation between Xi and tirm r.‘s is esarliy the ;»/- lliist 
between xi and xi, Xj, x<. Moreover, the requisile satnpliuK, thinoy if tin* 
variables involved are normally dbtribuUHl ia well wlaUhsluHl 
I want to discuss briefly an important generalization (?f tlii« kind of “Uualion 
that has been the subject of recent research. In particular, in hi’* pajwr, 
tions ^tween two sets of variables," publishwl in liiomrfnkti in Bt'hi }M li. 
Hotelling set forth those ideas in excellent fashion attd ctmtriliulwl much to the 
mathematical theory required for their practical apjjlieatitm. We n«*w 
that we have two sets of measurementB, x,, • • • , x,, and x.^ , • • • , x.m . made 
on the same object and that we are intcre8U*d in the linear relalhms riml may 
exist between the members of one set and the memb<‘ra tif the other. A** an 
example, xi, • ■ ■ , x, might be the prices of s more or Iww P'latrnl rciintnt»dilie« 
at a given time, and x,+i, • ■ • , x,+t measures of factors which mav la* thought 
to be effective in the price situation. 

In the more special case I began with, s « 1, and a sinffle tHjualiftn fully ex¬ 
pressed the linear statistical relationship of Xi with Xt, ■ • ■ ,x,+i. Xow there 
are s dependent variables and now with 8 S t, not one but « distinct linear 
relations will exist and will be required to fully describe the linetir (‘atineetiww 
tetween the two sets of variables. We may assume that there no mere 
duplication among the variables we are using, i.e., no one of the 4 x.‘« is olwavs 
exactly given by a linear combination of the otiiers in the mi anti the *wnc’is 


nitne I ® Amorioan Statistical Awocialwn, the Kwino. 

St kvS Tht ? 1 Mathematioal Statistics, on December M, IWl. in 
years. • ^ authors selected topics from papers published during the jWBt five 


74 



M \TIU;MATirAl. .HTATiatlCH 


75 


tnif r<l' tlif’ T,*i . ■ ■ • , . Now ih no lopcal or raatliemiitical 

m'rr««ity for Jh«* its which we arf‘ ko far vwiriK our me'u.sure'inontH. Suppows 
s . 2 wi'i t - 3 . V\ (* cati litui th«‘ iM’at linear rcRresnion eiiuatinn for a:i on 
j**, Ji. /s ati'l then liritl the like etjnnfiou for 0-2 on xa, . But we could 

vi*ry jw^v^tily 8,et more meatiinp: out of thi* Mtuation if we la-gan by rcpIucinR xi 
fsiid ri l»v. ray. so i n i x? and kj == X| “ xj arul similarly replacinp; Xj , 1-4, xj 
by thrie r.V formed from these (hna* x'a in a similar fashion. We have, really 
ln'en makituR a rpute arbitrary eht»iee among the k'h and c'.s that could be used 
mid the <|m ^tmrj |iresenls itwdf. What signilicance is there in the way we ehooBe 
our inV am! t '•*? 

It turns out to !«• nuieh nmre than a merely retusonable Imgiiming to try to 
detrrttiine a u from the first wt and a v from the second in .such a way that they 
will h* more cI<*M‘ly ern'Telati'tl than any other u and c formed in this linear 
fashion from the s x’s in the lir?^l .si*t and the I .r's in the .second. That is, wc sot, 

* «n 

>1 ~i and r = ^ b,x,, 

•4*^3 iMild I 

and deteimnti* the rt,i's and the b,'s which will maximize r„„. We, may say that 
this M and » vull accoiml for more of tlie linear dependence of xi , • • ■ , x. ujjon 
x,*t . - .x.,( than will any other « and r. 'I'o the mathematician familiar 

things Iw'giti to apiH'fir, though, jis Hotelling remark.^, in its purely mathematical 
form tlu" problem w*etns to lx* iww. X very im}H>rtaut observation is the fact 
llint this maxiimim would Ite tpiite uniUTected by any change in origin or 
seuh- on atty of tlw x's; it is even unaffected if we should lx*gin hy replaxting the 
first n x“h Ity any equivalent wet of« linear coinhinntionH of them as new variables 
to work with and l*y doing the same thing on the second .set of t'x’s. Hotelling 
tiiakt'# ii*d' of tills circumsUuico to greatly Hirnplify hi.s mathematical de- 
Velo(«nent.s, 

Now thiiigs full out in a very interesting way. ()ne actually solves not for the 
a'« and Ii'h at first but iiwteatl for the maximized r„, . Having this the corre- 
spomiing o’s and b‘« can then la‘ found. But generally the equation for Tu, 
gives not one hut s different values for What is the meaning of the s 
different r«,'s? Well, you rememlxT that I said that s relations (s S 0 would 
appear to exist belwc'cn the two aels of variahle.s. These s r„,’s correspond to 
those M linear relations wliieh tut* picked out in a unique way. We now have 
s M, V pairs whieh are imlependent of each other in the sense that no a or u is 
com»laU*d with any other u or c with the exception of tlie other member of ite 
pair, am! of course this correlation is precisely the r„, by which the pair waa 
detennimH:!. Further, the largest gives tlie maximum u and v we set out 
to find; tlie second largest determines the pair u, v of maximum correlation 
among those independent, in the sense just described, of the first pair; the third 
largest leads to the u, y of maximum correlation among those independent of 
the first two pairs, and so on. The s indeiiendent linear relations among them 
completely deserite the linear statistical dependence of the one set of variables 
upon the other. The relations arc e^ntially those between the u, « pairs and 



76 


CECIL C. CJIAIM 


the cloaenesB of these arc measured by ri, rj, • ■ • , r,, whieh I tvrif*' hir the 
srjs. The new variables art' calh^d canonical varinbb f* ami Uir t'«)rr»-!atititv« 
between them canonical corrclationa. We may say <hat ihe niaxinoim indr, 
u, V, gives both the Ix'st linear predictor that can lx‘ fnrmwi from j .*8 . ■ . 
and also the linear combination of n . ■ ■ ■ , r, that can 1 h‘ W< pri*«ljcf*‘d. 

I have to try to deal briefly with the mimerouK idcaa ami re'«uU#i in tlu‘> pafH>r 
which is not unrelated to earlier work by the author and by S Wilks hirst, 
what about an over all measure, of the linear eonneclion Udween the two «‘ta 
of variable? It is shown that 

g = ±r,r, . • • r. and ? = (1 - r?l(l ~ r?) • • • (1 - /,}, 

have properties that make it appropriate to call the tirat the fveetor) corec!ati«ui 
coefficient between the two acts and the second the n«>fReient tif rdienatitm. 
Both are simply expressed by means of detcrmiaanlw of the cf»VHrianrcH fpriHlnet 
moments) among the d r’s. For example, if s 1. ? is “imply ri j * m . If 
s = / = 2 , 

Turn ~ ruras 

^ “ V (1 - r),)(l ~ r^)’ 

the numerator of which is the tetrad difTerence of the iisycholtjgistj', h'urlher, 
if it should happen that xi and X 4 are identical, this q IxTorncs r„ ? ■ 

In an application, of course, the various quantities apiwaring td«>ve will have 
to be calculated from an observed set of values of ji, ■ ■ • , x,, x, ‘ , x* 4 ». 
Hotelling adapts an iterative process he had previously given to calcuhUing the 
canonical ri, ■ ■ • , r,, from which the canonical variables can la* found, and he 
numerically illustrates the whole procedure. But what is more <UflicuU ia Ui 
solve the sampling problems that arise. It Is very helpful to assume that all 
the x’s olsey a multiple normal frequency law. 

First, Hotelling derives expressions for the standard errors of the r‘,s and of 
g and g which are approximationa useful for large samples. But for small 
samples exact sampling distributions are needed. Wilks (21 had earlier studitni 
the exact sampling distribution of z in the case in which we are interested, that 
in the population the set Xi, • • • , x,, is completely indciKmdenl of the set 
x,+i, ■ • , x,+i , though he did not leave his general result in a form suitalde for 
calculation. Hotelling now finds the distribution function for q for h ^ 2. 
The result is not in all cases simple in form but numerical values can be oiitaiiuHl 
from it. The relations between these two possible tests, one bai«Hl on z and the 
other baaed on g, are discussed at length. 

An obvious undertaking would be to try to find the exact joint sampling 
distribution of the canonical correlations for any a and t, and I wdll say somC” 
thing about the very interesting papers in which this problem was Kolvcd. But 
some of this later work arose in a different though related setting which I want 
to discuss briefly first. 

In 1936 R. A. Fisher published "The use of multiple measurements in taxo¬ 
nomic problems,” fS] which was the introduction of linear discriminant functions 



MATIIKMATIfAb HTATIHTIC'.S 


77 


t«t fS«‘ unrId. SufijKiw that ^Vi randdm individuals of one race 

(sp.fif'., ^.tTK’ty, vli',,) have Iwrii inrasurr'd with rcspppt to each of k character- 
iijid *itat A'j rutirhtin individusil.s <tf another race have tx'en similarly 
inea-nr* d What lim-ar eoinhination of thcNp inenhureriients would .serve be.st 
to ih‘-«i!moi-h iHemlwrs of one race from thoae of the other? An exanijile used 
hy Fiidier in this pa|w*r was that, of two .xarnples of 5 t) jilanta each of two varieticB 
of iri** found growing together in the siune colony. In the flower on each plant 
there W!i- ni»«iu>ured the sepal length, j| , the sepal width, 0 : 3 , the petal length, 
j*s, fUifi the jH'tal witlth, Xi . What linear function, 

A' -■ XtXi + X3X3 -h XaXj 4 - X1X4 , 


would ennhle one to inont .‘•urciy identifj* the vnritdy to which each single plant 
l*elong, ? 'fo ehmisi' such an X Fisher prn|ios4*d the nmthematieal principle that 
the ciM'ffieienls, X, . j - 1. 2, d. 4, la* determined .so that thf' difference in the 

average value of A' in the one variety ami the average value in the other divided 
liy the '■nm of Mpiarc-s of the* A''.s takim alKuit tlie two group meari.s .sliall be a 
nnixiimuu Then tpiite ditiple mathematics lead.s to tlie re(|uircd numerical 
values of the X,‘s. 

Uut now that we have set up such an instrument as A", tht>rc is a more interact¬ 
ing use to which it can !«■ put. Sup|Hi.se that tlte puc.stion were to establish 
that the .Vs indi\ iihmls from the one group and tlie N% individuals from tlie other 
O'lilly Is'loiig to diffi-reiil raees fliatinguishahle with resiH'ot to the complex of 
cliametcrs we have ehosen to rneasun' in eiwli. W'e are on the old riue.stion of 
riieial likeness or unlikenesK and olivioiisly the word "raec" may have a meaning 
broad miough to give tliis work cif Fisher's wide aiiplication indeed. Subject 
to the principle aceordiiig to wliieh the coeffieienhs X, arc' determined from sample 
sets of measurements, A' is the lavst poasihle linear discriminant function. We 
are now facc'd with the ejnestion of the statistical sigriificancc of the difference 
iK'tween the nu*ans of A' for each group compared to the above mentioned in¬ 
ternal sum of sfiuart's. 

It i.H generally useful and enlightening in a problem of this general nature 
turning on the u»e of linear and tjuadratic forms to consider its interpretation 
as an analysis of variance or covariance. Fisher readily provides such a set-up 
in this ease hy assigning to the quality of lielonging to race A a numerical value, 
i/I, the same for all memlH-rs of that race, and by assigning in like fashion a 
different numerical value, j/*, Ui the tiuality of belonging to race It is 
matlu'iimtically convenient if we have samples of and Nt from races A and B 
rt'K|H‘ctively, to let 


V^ 


N, + N\ 


and j/3 


N'l+W 


for then over the combined sample of iVi -F JVs, we have, 


W1W3 

Wi + Nt' 


Siy) =“ 0 and Siy^) 



78 


CKCII- C. CRAIO 


This may seem somewhat arbitrary at first glaiiw, but let vw start atipw by 
writing the linear regrcasion equation, 


in which v takes on one of the two valuew above anrl in whirh t, i» thi- moan of 
Xi in the combined samjjle, and then pnrm'ding to determine the h, a in the usvial 
least squares fashion. The b/s turn out to be pro[wrlional to tlie X,V pn'viously 
found. Now the total variance of the j/'« ia annlywrl into that within gnniim 
and that between groups and it is immediately RUgsesbsd that the UMinl c-tesl 
with fc and jV - fc - 1 dcgreca of freedom is the appropriab* one, But, EW 
Fisher remarks, ordinarily for the appUention of this U«t one tto-slulate^* a ixqiu- 
lation in which the y's have a normal distribution for each fixetl wt of values of 
xi, K 5 , Here, however, the y remains fixed and erne pfwtulatw a 

normal distribution of the z's associated with a given value of y. Not to leave 
this matter in doubt, though I shall return to it, I may nunark that Fisher 
noted that earlier work by Hotelling HI show’CHl that the r-test is neverthebw 


the proper one to use. 

I have to be brief indeed concerning linear discriminant functions, Fislu-r 
wrote further papers dealing with them in 1938 (5), 1939 |fi|, anrl 191(1 17) and 
among others, Mahalanobis [8], Bose (9, 10), and R»)y ilO), of tlie "Calcutta 
School" have made relevant contributions. In particular, Mahalaaubis 18) 
introduced the concept of the gcneralisscd distance by which two sets of nmltiplo 
measurements differ, which has an obvious connection with the prcuent subject. 
Fisher also discussed a test for the direction in fc-«pace in which Iwti such samplca 
differ most and in case we have three such samples from three different ra<’t*s 
provided a teat for their colUncarity. 

In his 1939 paper mentioned above (O), Fisher called attention to the connec¬ 
tions between the theory of linear discriminant functions and Hottdling’a ca¬ 
nonical correlations. Of course it can be said at once that a linear diacriminant 
function arises as the very special case of investigating tlie linear relationship 
between the artificially introduced y and Xi, Xt, ■ • • , x* . And the teat of sig¬ 
nificance baaed on the analysis of variance turns on the ratio of the sum of 
squares due to regression, i.e., among the predicted values, to the total sum of 
squares for the regression and for the nsiduals. This analysis is quite general 
in form and can equally well be set up if one is predicting linear forms fomred 
from JVi variables from linear forms made up from Nt other variables. If one 
sets up the condition that this ratio, bo a maximum one fa led, as Fisher ahowa, 
to a detemunantal equation in 6, the roots of which are the squares of Hotel¬ 
ling's canonical correlations. 

Mathematically the general problem we are interested in k equivtdent to the 
following; We have a sample of Nt Nt observed values of p normally dis¬ 
tributed variables. If fa the covariance of the t-th and j-th variables in the 
sample of Ni and hu the like covariance in the sample of Wj we want the sampling 
distribution of the roots of the determinantal equation; 



MATHEMATIOAli STATISTICS 


79 


i a,/ ~ i?(a<i + bi,) I = 0, 

undfT thf hyiK»thmH that the first sample is independent of the second. This 
problem Fislicr holvwl in his 1939 pajier though in his characteristically concise 
and intuitive manner. But in the same number of the Annds of Eugenics, 

P. I,. Khu III], at. Fisher's suggestion, gave a complete analytical solution. Hsu 
aim showwl more in detail how the result applies to Hotelling’s case of N ob- 
mrv'atiotw on « ft normally distributetl variables in which the set of s is inde- 
ptindent of the m*rond set of L In his 1930 paper Hotelling gave the result for 
2 and in 1939, (iirsehick (12) gave the solution for s =* 2 and t > 2. * 
Hsu showed, too, the striking fact, mentioned by Fisher, that it is suflScient 
that only one of the two mts of s and of t variables be normally distributed in 
onler that the distribution function found apply. This provides the explanation 
of why the te;Bt of significance applied by Fisher for linear discriminant functions 
is valid even though the y introduced had an arbitrary distribution of values. 

Tlie simuUanemiH distribution of the canonical correlations is fundamental 
but on finding it not all difficulties are thereby resolved. As mentioned above, 
either of the ciuontifies, z or g, as they appear in Hotelling’s paper, furnish over 
all teats, or rather they would if their distribution functions were obtained in a 
satisfactory form. The form of the, distribution of z for complete independence 
was given by Wilks as early as 1932 (2) but that of g for s > 2 is still lacking. 
For « > 2 there are, difficulties in applications even with z and in 1938 [13] 
M. ri. Bartlett proj-xwed a morn convenient approximate tost. Ordinarily, how¬ 
ever, one WQuld want to teat the largest canonical correlation alone for signifi¬ 
cance. There arc two kinds of trouble here. First, there is no assurance that 
the largest ahfw‘twcd canonical correlation corresponds to the largest one in the 
population. Second, it is quite important to know whether the remaining popu¬ 
lation correlations are zero or not. Bartlett in 1941 [14] discussed these points, 

Now I make an abrupt change in subject. Some interesting work has been 
done on the thesory of runs and its applications during the last five years. 

First, I want to try to convey some idea of the contents of three papers by 
W. D. Kermack and A. G. McKendrick published in 1937 [15,16] and 1938 [17]. 
Suppose we have on unlimited set of numbers, no two of which are equal, and 
start drawing from them at random, recording the numbers in sequence as they 
come. Within the sequence drawn there will occur runs up and runs down of 
varying lengths. Thus in the sequence of 10 numbers, 2, 6, 11,8, 9, 4, 3, 7, 14, 
12, there ore 3 runs up, one of length 2 end 2 of length 3, and 3 runs down, 

2 of length 2 and one of length 3. Both ends of a run are counted in finding its 
lengtii; no run can have a length less than 2. The total number of runs is 8 of 
which 3 are of length 2 and 3 are of length 3. We can also count the gaps which 
extend from crest to crest or from trough to trough and note their lengths with 
the convention tihat again both ends are counted in determining a length, so 
that no gap lengto is less than 3, Thus in the sequence of 10 numbers above 
there is one gap of length 3, 3 of length 4, and one of length 6. 

It is clear tiiat if we know the distribution for runs or for gaps of different 



80 


t’Kr'n, r, I'K'kllt 


lengths \vp can ('oinpint' an filvs-crvc*! or rafh'-f jun nl'w-r**»-4 

oi mns or gajw by h'ngtbs, with tho fp'tpn.tiri. . <’al< ijlai* ?! ,,n tj,, h^'p-ftlw i« 
of randomtu’iw and U- by way nf arriniring a u-t ifiij tla‘> h\ 5 >,»h» o,. ’l‘,< 

brief, in tluw paia-rs tho,‘s<' thwirotiral di-<nb5jn«*ii'- ar< honoi •« g-ili* r up!, 
their means and variances. Then* are wmie iitf* rr-'uig .'q-f<h< 
random sarapling nninlrew and a wnes ol n-ierfced imTHiwr- 

passed the x“-test as random and also pa-ja**! tin- lest lwic-e<l «.n fh<- d< p.irttire >4 
the mean from its exiterted vjdne rompanol with lU »t«nd.Tmi < tn 

the otlicr hand, the serie.'i of Kwiflish deaih ran-, for the }» S75ft ipMn 
could not conceivably ix* random, lliis investigaliott W(i.r m ijo- ijj-st 

place by the flucluationa of the death rate from eeiioiuflsa m mn-e ut an e\|«'ri- 
mentally induced epidemic. 

The problems here dealt with had Is^en only {(artisdly cidved 5 .y -arh* r w ntors. 
There is much interesting material in theKf- pafMTc I have no sp^ef for, Tim 
authors readily inchulc the cnsi* in which the numlwm cumpretitis fie- j-qndatnin 
are not all different. Tlii'y alao K(«rli(*fl eeries of Hndl'^'l leiiRth, .ii n,-'- rtrraURed 
in a cycle or ring and even what may U- lernie*! a Mohnif * vr}r 

A. M. Mood m 1940 [18] in an intert>sting paiwr investiKutrd a different form 
of the problem of rums. Huppowe we have n elements of two Km-iv. .,?tv n, mT 
and ns = n - n, //s, and that thtw are arrotiged »« random m a For 

example, if ni = 5 and « 7, and If a ramhim arrangi-mf-ni of thu' FJ nmi h'w 
is babbbabbaaab, the a'« occur in 2 rtiim of one and m one nm of d and iliv fi\ 
come in 2 runs of one, in one nm of 2 and in one run of d. If r., » - i. 2f h 
the number of runs of j of elements of variety f, Mmal fimis (ho prohsdnhiv of 

obtaining a given set of values of r,; aueh that 21 Jr,,,I, o , j >r of 


.uioN tins 
If* ’'Hell HK 


obtaining a given pattern of runs In the two kimis of ohjecto, Ik- 
basic distribution function he obtains certain marginal distnlmtionf* -uir!, m 
that for the occurrence of a given set of runs in the a’s regartlU'w* of how the h'n 
fall (except that they must provide the necessary iK.ints of division), or that for 
rx and if these are respectively the total numbiT of runs ..f «’h and <4 hV or 
that for _rx or alone. He finds the factorial moments of thm* varnihle* jmd 

covariances. Similar rwulm are ohtaimHi in 

drawings from an infinite tH.puhaion in which 
twi lTndl oT kinds occur in fixed proportions. Finally, in 1,«0, of the 

trons“icd arrsr? — xhatrihu- 

tions eti^died as the sample sue mcreasea. As Mtmd noU>», hx«rf. t.»o a fnw of 

wi+y antedating Mood’s by some six months, A. Wald and Jf Wolfo* 

a test^^the hvZw numbers of two kinds of elements to provide 

with a conWus distnbutior pafmlation 

uous aistnoution law. If the observations in the two samplea 



MATHEMATICAL STATISTICS 


81 


roniltiru'd ar<' nmwig(*d in nrrlfr of rnagnitudc and if then the observations from 
dll* iir.^s waiiph* an* eaidi replared by a zero and tlio.se from the second are each 
replftccd hy a one. wi* fiave a situation to which this distribution function for 
nui** applies. W. L. .Sfi-vens in H):h» [20] also discussed an application of this 
distribution. 

Thi* third pririripid topic I have chosen for my remarks is developments in 
till* use of tin* probahilily integral transformation. The use of this device at all 
MH-nis to Im> ijuite repent, apiH'aring m a paiwr by II. Cramer in 1928 [21] who 
invented a t<*st of giHidness of lit wfiieh r(*appeared as the, '‘w“-te.st” in apparently 
indeis-ndent work of R. von Mises in 1981 [22]. In 1932 in a section new in the 
fourth edtlion of “ritatiHtical Methmla for Iloaearch Workers," [23] Fisher showed 
tlie tjscfuhiw of liuH f rnn.sformation in eombininK independent tests of .signili- 
enncc and in 1938 and 1934 Karl Pearson [24, 25] had papers in Biomelrika on 
till* subject. 

As for the transformation itself, supiKise that p(x) is the probability density 
function of a continuous variable £ defined on the range (a, b) .such that, 

[ p(x) dx = 1. 


Then let us udrortuee die variable, 

y « f p(x) di, 


which is the jirobabilily that a value of the variable at random will Ik* les.s than x. 
It. will Ik* seen that sitiei* x is a random variable, the iiroiKirtion of population 
values less than an x drawn at random is itself a random variable. Perhaps 
this will 1 h* cleanT if I u.s<* a simple example of J. Ncyman’s to show how a 
sample of x*« also determines a sample of y’s for a given p{x). Suppose that, 


p(x) = 



e 


f 


and that a sample of 5 values of x arranged in order of magnitude Is: -1.5, 
■"l.l, -0.5, 0.(1, 1.9. Then Viy n‘fen*nee to a table of areas under the normal 
curve of ernir, we find that the eorre.si>onding observed y's are; 0.067, 0.130, 
0.309, 0,720, 0.94.5. It is ohvioUH that the range for y is always, for any p(x), 
(0, 1). Further if/(y) is the probability den-sity function for y, of course. 

At/) dy = pi,x) dx. 


But from the definition of y, 

dy » pix) dx, 

so that/(y) = L Thus, quite independently of p{x), y obeys a rectangular 
distribution law on the i*ange (0, 1). 

This simplicity of the distribution of the quantity y and its independence of 



82 


(menu C. CKMd 


p(x) axe moat attractive propertiea. I aliali nufi* ths* jipitUrationR 

that have been made in teeent jTan*. 

In 1936 W. K. Thornpaoii 126) detmUaJ Vjy p* the 'bai in a “amplip 

of iV a randomly chosen x will lie Ickr than Xi, the It-ih vglm* Then 

the probability tlmtp' Si Pk Si p" is j««t p" — p\ llte prohahjhty that »*w(Iy 
r other members of the sample will l>e hw* than Xi is thf'ti, 

» pkf ' *. 

Further for all samples in which just r values nmir Iw thsui xi. Ihe {<r*>{j«niion 
of occasions on which p' S pk S p" ia Riven hy 

/J, P'(l - /)?(r + 1. -V - r?, 


the difference of two incomplete. )3-tunctiona. But that there are eSBrily r <»te 
served x'b less than r* is equivalent to saying that jn is the <r t 1 pst (tlwrva- 
tion in order of magnitude, so that in the alatve we may as well replace r tiy 

fc — 1, It IS easy to find that the expeeled value of p* in Kiirh 

A + 1 

a(W _ a 4- n 

and that the variance ia - i-- . It follows from the fir^l of ihi’W two 

{N + l)H/v Hr 2) 

expressions that the proportion of occasions on which «r x < Sv-kn is 
k I 2k 

— j [— , (A^ 4- 1 > 2fc). Statements of thin kind w*iahh«h rvmfuh'nct* 

lunits. Thus if one says that in a sample of M, an olwairvatiMn at random wilt 
fall between thel:-th and the {hf — k + Ij-stotwervatiotw in omlerof magnitude, 

such a statement has a probability of \ ^ ^ of twing true. Or, the 

iV + 1 

integral just above is the fiducial probability of the tnilh of p' S p* S p" if 
in a sample of N the fc*th observation is the (r + l)-sl in tmler of magnitude. 
Thompson went on to obtain confidence limits for the median hi a wampte of M 
from any population. 


In 1939 Wald and Wolfowita [27] studied the problem of obtaining confidence 
limits for v{x), the proportion of observations in n sample of K with valuM 
lew than a given x, the population obeying any contimuma diatriinitian law. 
Their arguments are too complfeated to attempt to ak-eteh them here, Imt tlicy 
are based on the fact that the transformed variable, y, m defint^ atwve, is 
rectangularly distributed on the interval (0, 1). With their exact wlutimi Umv 


gave a more convenient approximate method for calculation in appHeatioris. 

in 1938 (I am not being strictly chronological) E. S. Pcarwn (28) publiahcd 
a study of test entena based on this probability integral transformation. Bun- 
pose that we have n independently observed y's, y,, Vt, ^ . How should 

the ps 1^ i^ed to the hypothesis that the observations from which the w*8 
were calculated all came fiom the same population? K. Pearaon (24] Imd 



kATOEMATICAb STATISTICS 


83 


aln-iuly thr iw of Q » y,j/, • ■ • y„ or Q' = (1 - y,)(i ~ y,) ... 

(1 " It w known that a simpln function of Q or of Q' obeys a x'-distri- 
biition with 2n dcgrw‘s of freedom so that we have a ready means of combining 
mdr‘i>endent t4wte based on Q or Q'. Rut how is one to choose among Q, Q', or 
otiier funetioiw of the y's that might be suggested? E. S. Pearson emphasi’zed 
the role that the. hypothesw conceived as slUirnato to the one being tested 
should play in making such a choice. He ilhistrates this in a case of testing 
the hyiKjthwiH that a sample came from a normal population of ssero mean and 
unit variance anti in which the alternate populations, from one of which the 
mnipk* might have tx>en drawm, are such that the corresponding y's calculated 
on the hypothesis l>etng tested would follow a Pearson type I distribution law. 
U.sing the ItkelihfKxl principle he was led in this case to Q or Q\ which are then 
concluded to be "heat poswible teats." 

The final pape^r I want to diacura is an important one by J, Neyman on the 
"Hmooth hnst of goodneas of fit," published in 1937 [29]. Suppose again that a 
random sample of iV values of x gives the set, Vi > y% i • yn on the hypothesis 
Ih that the ixipulation distribution law is p(x | H^. If Hti is true the j/’s in 
random samphw do follow a rectangular distribution on (0,1), But what would 
be the distribution of the y's if the distribution law for the population were 
actually p{z j //i)? We have for the y's as calculated, 

y “ f p(a I Ho) dx. 

But to fmd/(y), 

/(v) dy « p(z 1 Hi) dx, 

m that, 


fiv) 


p(x I Hi) 

peijS#) 


^ 1 . 


Therefore if Ho is not true, the y‘8 calculated on the assumptiop that it is may 
be expected to exhibit a statistically significant set of deviations from a rec¬ 


tangular distribution. 

As Neyrdan remarks, it is a defect of the x’-test of goodness of fit that the 
information one has of the algebraic signs of the differences between calculated 
and olwcrved frequencies, particularly of the way |n which positive and negative 
differences succeed each other, is completely unused. And in forming a test of a 
statistiesl hypothesis it is now well understood, thanks to Neyman and Pearson, 
that due account should b© taken of the alternate hypotheses conceivably true. 

Neyman begins by specifying a wide class of alternate hypothes^ in a form 
that lends itself to mathematical treatment. This is done by assuming that the 
distribution of p’s calculated for Ho will, if an alternate Hi is true, be given by a 


function of the form, 




84 


CBCtI- C. CRAia 


in which T.(y) is a polynomial of flcgrca i (a tranKfonwHl I^gcndrc iwlytiomiall 
with convenient ptopertiw. For low vn!ue« of k, aurh as will orrimarily Ih* uwfl. 
this permits alternate distrilmtion curves to rieviato in a «mrwtth maimer from 
the distribution tested, with a limited numlier of intem;elinnii with it. 

Now the problem is to determine the function of llie utwn'fd i/'a witieh will 
provide a suitable t«it of Ht with resjiect to the alternate h,vfxit.luw'.f* t»f ordi-r or 
class k, k having been decided upon in advance of making the lent. The rnalhe- 
matics, proceeding along Neyman and Peamrn lines, shows that the appropriate 

it 

function, for large satnplea at leaet, ia dmply *4 5n which* 

1 ^ 

the ]/5’s being calculated from the sample. Moreover, the probability that, the 

Jp 

sum u* exceeds a given value is at once obtained from a table of thtrineom- 
1 

plete r-funotion, i,e., this sum is proportional to a x*. 

This is a very fine piece of work but, oa Neyman iKiinte out, therr art^ «UU 
questions to be settled concerning the general utility of this '’amooth 
F. N. David in 1939 [30] further discuss^ this teat. In parlirutor, it may l)e 
pointed out that the parametens in p(x\ He) must bt‘ amumed known; what 
would be the effect on the test of estimating thtw parameters w unknown. A 
reasonably large sample seems to be requited to make the developments on the 
assumption of large samples applicable but a y must 1)0 coleulatetl for each 
observation. This makes for a good deal of computing but it i* not known how 
grouping of observations might be effected. And the matter of the choice of the 
order of the test to be applied, i.e., of a value of k, is still somewhat in doubt. 

I will not debate the proposition that there are papers completely omitted 
from this discussion as important as those I have included however inadequately. 
The limitations of space forced me to choose and it is quite pcwible that my 
personal tastes and interests had more weight than they should. 


REFERENCES 


[1] H. Hoteluinq , "Relations between two Bot« of v&iriftlwi,'’ 38 (1W, 338-W. 

12 ] S. S. Wilks, “Certain generallaations In the analysis of variance," Biornmrilm, U 
(1032), 471-404. 

[3] R. A. PisBBB, "The use of multiple measurements In taxonomlo nmhlems," Aemfs 

of Eugenia, 7 (1088), 170-188. 

[4] H. Hotelling, "The generalisation of Btuclent’s ratio, ’• Anrwte of M«lh. SM., 3 

(1931), 360-378. 


(61 R. A. Fibkbr, "The atatiaUoai viUHsation of multiple measurements," Annah of 
Eugenia, 8 (1938), 376-386. 

[6] R. A. Fishbh, "The aampling distribution of some statistics obtained from nen-liaear 

equations," Annali of Eugenia, 9 (1930), 238-249. 

[7] R. A. Fisher, "The precision of diBoriminant funotions," Annnh of Eugmia, 10 



M\THhMATir.VL KTATIKTICS 


85 


|SJ I’ (" "On (Ilf Kf‘ii»*rali?,pd distance in statistics, " Proc. NaL, Inst, 

p<-, h„l , 12 Jtirw), « 5$. 

|ti,' H f * liii* *, "ttti tlie (‘x«rt distrihiition of the 1)» statistic," Pnnkhya, 2 (1030), 143-154. 
fin] H f Hm .j X Ktta, "The exact distribution (if the Studentued D* statistic," 

X'uoHv'J. 4 part 4 

|H) 1*. I- Hk , "On ihi* distribulton rtf nnils of certain detemunnntnl equations,” Annala 

;♦ OTO). 250 '253 

112] M A fc, "On the sampling theory of tlie rcints of detcrminantal equations," 

Anmh o/ Math Ptn(, 10 (1030), 203 224. 

jlS; M W Hxniii’T'T, "huttlier Bspecls of the theory of multiple regression," Pruc. Catnb 
Pht .W . -'Ut.’iK), 33 10, 

fn; M H lUMTiXTr, " Hie statistical RiKnilicance of canonical correlations," Biomelrika, 
32 I'ltMlj. rt 37 

|15I W. 1) Axt» a. fl. MrKfcNnRifK, "Testafor rnndimmess in aseries of numcri- 

leai iibservatinfis," Hoy. Par. Kdin. Proc., 57 (1037), 223-2-10. 
lifiS W. 1). KrHxtACK xNti (!. MeKKNitmcK, "Home distributions associated with a 
raioinnily arrangcil act of numlicnj," Ibid, 332-376. 

|17i W. Il KfclixfvcR A?<t> A. t;, .McKt.surick, "Sump properties of points arranged at 
nandotn eit a MribiiiB surface," Malhenialical Gazdlc, 22 (1933), 6&-72. 

|I8j A AJ Abmn, "The (lislnbutirm theory of runs," Annals o} Math. Stal., 11 (1040), 
Ml 3'fi 

|10| A WAi.tt aS‘i» J. W.ii.rowiti!, "On a test whether two samples are from the samo popu- 
latKin," ,4nnafB of .Malh. Pial., 11 (1940), 147 162. 

(201 W L Htr.vntfp. ''Dialributitm tif groups in a sequence of alternatives," Annais of 
^lujM’nirr, tl H939), 10-17. 

(211 11. ('BAMi'h, "(Irt the eomiKMtitnm of eleracnUiry errors, Second paper, Statistical 
appliralioiiB," Pkandimetsk Akluarielidnknfl, 11 (1028), 141-180. 

(231 li- I'ftx Miass, "Vorlcmingcn aus dem Ciebiete der angewandton Matematik," BJ. 1; 

H'ahwhcinfirlitolrrcr/inurtt;, (beipsig, 1031), 316-335. 

(231 It- A. Fibmkb, Plalklml Mrthodo for Kemrch Workm, (Kdinburgh, 4tb edition, 1932), 
Article 311. 

(24i K. FsAitsAN, "(In a raetlipd of determining whether a sample of siao n supposed to have 
Ix'cti drawn from a parent population having a known probability integral has 
probably Irccri drawn at random," Biomelrika, 26 (1933), 379-"110. 

(251 K. Pkah-mih, "On a new method of dclcrroining 'goodness of fit,'" Biomelrika, 26 

(1934), 425-442. , , . 

(261 W, R. THOMfwtm, "On confidenec ranges for the median and other expectation aw- 
iributittM fur populations of unknown distribution form," Annals of Math. 

BlaL, 7 (1930), 122 128- „ 

(271 A, Wald and Jf. Wot-rowm, "('onftdcncc limits for continuous distribution functions, 

Annuls of Malh, Plat., 10 (1939), J05-118. _ 

128) E. 8. Pbawkw. "The prolmhilily integral transformation for J 

and eombining independent icslsof significance." Bwmelnka,M (1938),1«H-14^ 
|29) J. Nbvman, "A amoolh Utsl of goodness of fit," Skandinamsk Akimnehdsknfl, 20 


(1937), 149 IW. r , -u f in 

(30) F. N. David, "On Neyman's 'smooth' test of goodness of fit, I. «b“^n of tM 
criterion f when the bypolhosis tested is true," Bwmetnka, 31 (1989), I91-iay. 



NOTES 

This section is devoted to brie/ re,%mreh nnd ufnmioiy nrSjrl/j., n>d> *»} m* ihioMo^y 
and other short items. 


A FURTHER REMARK CONCERNING THE DISTRIBUTION OF THE 
RATIO OF THE MEAN SQUARE SUCCESSIVE DIFFERENCE 
TO THE VARIANCE* 

By John* von Nki'mann 
Indiluie /or Athancfd 

1 Introduction. In our previous iuiimt' it wa?t ffninti rftin»'(u*'us U\ .wunif 
that the mmitx'r m (of thr varinhW of tin- (jnnfiralir ffirm rntivinji-ratioiii 
is even. (Of. p. 383, lor. cit.) This ineans that it> t!ic aitpliotUaui h* fhe nu'afi 
square .succcKsive diflerencc rt =» m + I must Im* taltl. U’f. }». 3WJ. id i 
In this note we shall show that the tlistrihiiMon for an rwid mi u r- lut even »> 
can bo expressoci by means of the tlistrilnUion fnr an even m the latter fs-uig 
already known, loo, eit. 

Specifically, consider the cliKtrihulinu of y ~ 53 *’*•''» < ‘ 

m 

equidistvibuted over the surface 23^11 ** !• iJennte the jn-nplel tui , • • , n«,> 

fiMl 

by A, then the distribution function of y deiH'nds on A; denote that tliwtrilmdon 
by ciia(y). (Cf. p, 372 id., w'c write a, for the there.) 

Now consider an m-uplet A » (ui, • • • , a„) and a /Miplel B dp . - ■ • . hp) 

and form the tn + p-upict C = (ai, • • • , u«,, fp , • • > , hp). Writ** V A 4- B. 

Then we shall .show that there exists a simple exprtwion for wety) lU terms of 
wxiy) andajB(y). 

For the specific application to the mean square aucccRsive diflerenee, we etin 
put ft = m + 1, A = (cob (irp/n) for p « 1, • • • , |n — 1, 4* 1. ■ • ■ , n — 1), 

B = (0), C = A + B ~ (cos TTp/ft for M = 1, • • • , n — 1). 

2. The recursion formula. We proe,ccd as follows. w,<(y) ran ftlso Ix* umJ 
to express the joint statistics of 

Ht m 

y » and p « 23»*i 

or better, the volume of that part of the Xi, • ■ • , Xm-space which corresjitmds 
to any given domain in the y, p-plane. Thus the volume correaixmding to a 

'Cf. the paper by the same author, Annah of Malh. Slat,, vol. 12(1941), pp. 367-396. 
’Also Scientiflo Advisory Committee o( the Ballistie Research Laboratory. Aberdeen 
Proving Ground. 


86 



lUtiri <th' Ml.AN’ HQi; U(K HtTCOKSSIVp: DIFFKRENCE TO VAHIANCK 


87 


(Tivt ii nil!iiit4“'!iiial y. p diirnain dyilp will clearly ho 

\p/ P 

u\ 

wlicn* (\t I** f!i»' Mi I-iliiiicii«it)nal) area of the I’l, ■ ■ * , .t'm-suifacc 2 
iHiit ’iili'Mi-), I.C., tliii volume ia 

(1) ^/7ifp. 

Siimliirly for 

s' = ^ hr ul and a = y^. ul 

IPesl Vtal 

till* volunie ('ojTc,‘'i>nuiliiif!; to the inlinilesimal f, a- rlomaiu df d<T in 

(2) -d^da. 

w j) m P ^ 

Firuillv for P 7 + f "I" X) ^oicl T = pH-o' = X^®,j + 22ji 

■ i»«i (—1 1*“' *'“1 

tl»‘ volume eosi'chiamdiriK to the inliniti-simal 0, r domain dO dr i.s 

(3) 

Xuw 0-^7 I' {*, r “ p F M connect (1), (2), (3) a.s follows' 






jf dp-J dy • K «Q P*” “• ^OM, (r py^-\ 


This Kives (either iiy simply putting r - 1, or else by replacing 0, y, p by rff, 
T7. ’■p) 

C, 


» 2 C„V, I I (1 - ~t )^ '■ 

To detenniue ^" apply to this f dO • • •. Then 
1 f’dp'P*"' ‘d ~ p)'” ‘ 

^C’mrP 


P-P’ ffli» i„i _ p~p’ '■liiOiM 


Accordingly: 



88 


K. 5 F. WM'KV-MH H 


3. The special case. L<*1 ih tintt* rofiirn l<» nv*'' at fhj* 

end of l -the application to the mean i-ijnair ‘•urrr'-.r.o •Uffer-n'--'' 

There p = 1 ami B ~ *01, mi that the ‘'iltJ-tril.iitiou” of { r< r-to * n1?at*-4 at 
the point 0. Hence wbIT) i*' “iniproiaT”' ili^nloUn-ti, r.jto'‘JttTo*-oi! in the 
same way.’ Using (' and A fleM’ril«il at the end of j, ilo rtiw'i'- fornwSa 
becomes (now m « n “ 2, p ^ I) 


( 11 ) 




dp-i^t 


Hi 


UK" - n] f' 

fi|(H - 2)ir(^i k 

It would have Iveen equally easy, of emmw. to esfabUfh <11? dm’< rSi' 
Putting p — \/l givea 


(HI) I.' 

Since tiiAfy) 'vanislie.s for j t j > co« (x/'n), we niiiy n plaeo tin*' iutrgrai 



Formula (III) can l>c used for numerical work, and aiM^i to extend the fiirnmlR 
(3) on p, 391, loc. cit., to even values of n. 


CONVEXITY PROPERTIES OF GENERAUZED MEAN VALUE 

FUNCTIONS 

By E. P. Beckknbach 
Unwersily of Michigan, 

In an article appearing in tlie Anruda of Mathemaliml StalijilicM^ it wm* jMiintotl 
out that while the mean value functions apiM»aring lielow have Iwri Htutiied 
and used since 1840, there appeared to have been no attempt made to inveati- 
gate the behavior of their second derivatives. 

Consider (1) the unit weight or simple sample form 


in which the xt are positive numbers and in wliich I may take any nml value; 
(2) the weighted sample form 

to(o = f^j£L+M*±.i::± 5 "?'''Y'' 

\ Cl + <5» 4* • • * •+• C„ / ’ 


’Dirao's famous delta function.’’ It oouid be desoribed by a Stleltjes IntagrAl. 

’ Nilan Norris, Convexity properties of generalised mean value functional" Annola of 
Math Slat., Vol. 8 (1937), pp. 118-120. 


ip{l) 


(^1 


4- at + 

n 



MEAN VALUE FUNCTIONS 


89 


in which the r, arc |H>nilivf> nunii)prs, and in which the x, and t are restricted as 
in {stJ'i dic integral form 

»i which/i4-> i^ a {Kfsitivc cnntinuouH function for an S a; g 2:2 . 

Since the analysis anti r(‘«ultK are ensentially the same in all three cases, we 
ri(»*'trict nur attention to 6U) 

As is wvll knowTi,® ${1) is a monotone non-deorcasing function which varies 
fnun the minimum of f(r) to the maximum of /(x) as t increases from - 00 to 
+ * . It if« further of some importance to study the rate at which the rate of 
inen'OW’ of this type bias is changing as I increases; the rate in question is given 
by the wcond derivative 8"(l). 

The following points were made by Norris, loc. cit.; (1) Since, as we have 
pfunUsl out, Olt^ has two horizontal asymptotes, 0(,t) must have at least one 
inflection jKiinl. (2) Consideration of a simple example shows that theie is not 
nectwarily an infksdion iJoinl at i = 0; e"(0) can be made to take on any real 
vahn*. 

Thus it is not true that B"U) must be positive for all t < 0 and negative for 
all t > 0. (In the other hand, we shall give simple bounds for d"{t) in the other 
direction; namely, we shall give a {Kisitive upper bound of B"{i) for t < 0 and a 
Itiwer IkiUiuJ for I > 0. These Ixiunds are precise in the sense that they are 
aeluttlly taken on in the special ease/(x) » const. Their main advantage, lies 
in the fact that while the expression for 6"(t) is quite involved, these bounds are 
simple expiv*ssions in the quantities d{JL) and 9'{t) which might already have 
Ihhti eonipute<b 

Let 


X(i) s log 9 ( 1 ), 


Diffenmtialing, we obtain 


fX'CO ® c 


f,. /* [/(x)]'log [/(*)]'da; / 1 , \ 

--log(_L_/ LfWl'*). 


It followfs* that 


and 


X'(0 & 0 

g'(l) ^ 0. 


I.«t 


fi(l) m t’X'(t). 


•Bee for iastenoe, G. P6lya uad G. Szegh, A^^fgahen und LehTsdke owa der Analysis 
(Berlin, 1025), Vol. 1, pp. 64-56 and 210-211. 

•Bee G. Pdlya und 0. Szeg6, loo, oit., p. 210. 



90 


K. r. BEt*Kr,Sll,\CH 


Curiously, while X"(l) and apfwar tt» l«' raUirr h*r«iJi«i!i!4<*, tin* 

related quantity ju'(t) i« ifiade relatively i»inspjt* hy ^!u- f»rf ffmt fvn t4 tht- 
tenne obtained by formal differeiitiatkm im* »eg;ativ«>«< of r-a^'fs <jthfr; ami 
Schwarz' inequality can Iw applied to the rernaimriK w ffdhwf^ 

We obtain 


[f(x)l‘dxjM'(0 » «[(£* (/(^)l‘dx).(£‘ 


s' 

“ l/Uir j . 


By Schwarz’ inequality/ it follows that 

jj'(i) » fx( 0 , 

with 

t( 0 g 0. 

the sign of equality holding if and only if /(r) m mmt. 
From the definition of n(0 we obtain 


mV) =■ <12X'(0 + tX''(01 « 


L 


to + inn - 


dg't'tif 

S(tt 


whence 


2X'(0 + tX"(0 « i mi) + t&'V) ~ 


xU) g 0; 


that is, 


fx"(o ^ -2x'(t), w"(o g -- mo. 


HI) 


It follows that for t < 0, we have 


and 


while for t > 0, we have 


and 


X"(0 5 -2X'(0/f 
o"(t) < 2fl'(0 

^ HI) "T ■' 
V'(t) g ~2X'(0/i 


»''(() a !»W - “TO 


‘ See G. Pdlya und G. 8zeg6, loo. oifc., p. 6i. 



('HAKACTEIUZATION OF NORMAL DISTRIBUTION 


91 


A CHARACTERIZATION OF THE NORMAL DISTRIBUTION 

Ky Euoene Lukacs 
Ballimore, Md, 

1. In Ffuiipling from a normal {mpulation the*. dLstributions of the mean and 
of the varianri" an< mutually indej-auiflcnt. This well known property of the 
normal tlistrilmfion is u-wtl in dcrivinR the distribution of "Student’s” ratio. 
Thf‘ indcjHuulcnct* of tin* distributions of the mean and of the variance charac- 
ti-ria^K the lutrrnnl di.stribution. To show this one bo-s to prove the following 
statement: 

A memmry and sufficient runrlilion for (he normality of the parent distribution 
in (hat (hr snmjding diHlrilmlions of the mean and of the variance he independent. 

'I’hat this condition is nort'ssary follown from the above mentioned property 
of tin' normal distribution; so there i.s only to prove that this condition is suffi¬ 
cient. This was first proved by R. C. Geary' by using some of R. A. Fisher's 
general formulae for the .seminvarianta. However, a different proof, u.sing 
cluvnicteristic* functions might tx' of some interest, 

2. lad fu) lx* the density function of a continuous probability distribution 
and let xi , j-j, • • ■ , x„ he n ob.s(>rvntions of the variate x. Denote by 

*t 

/ 22 -Ca/x ibe Hample mean, and by 

n n*-! n-~ l 

s'' •* ^ (-Ta ~ =* [(n — 1) 22 3:!. — 2 22 22 XdXo+d/n* 

l a-l fl-a 

the sample \'arianee of tho.He observations. The characteristic function of the 
distribution is then given by 

( 1 ) id) j e'“f{x)dx. 

The characteristic function of the joint distribution of the statistics S and s 
is knovm to !» 

(2) >pdi, fa) “ / • •' / ‘' /(*•») dxi'-'dxn. 

In the some way one obtains the characteristic function of the mean S as 

(2a) v’i(fi) “ <pd\, 0) ^ j ’ ” j ’' ’ /(*") > 


> H. C. Geary, "Diatribution of Student’a ratio for nonnonnal aamplea,” Roy. Slat. 
Soc. Jour,, Supp. Vol. 8, no. 2. 



02 


KVOKNK l.Vh 


and tlip pharactnristir funclittn nf tin* "i "li'' sr;;«s«p ' 

(2b) M ^ 40, h) ^ j ‘''‘Va) ' • ■ . 

The incJeppndc'riw of the clitttrdiuliiHif* of i and nn^'.iK'' sn > f *lit 'hrir- 

acteristic fvmctiniiH ^pf/i, h) “ ‘t 

(3) ■ 

Substituting in (2a) t ^ n, it 

v^i(b) » n / e"''“'7(-e.) rf^» - [/ 7(^5 •IrJ • l^t b, r. 


therefore 


Sv* ' ttt, I 

rr! “ W<i/«)l V# 

vtj W*! 


Differentiating (2) witli rwiH'ct to h 


« i J •. • J • • - /fA) dj, •«■ dx. 

Substituting s’ « [(n - 1) X)*« ““ 2 22 wd •? " 22 

I 

obtain easily 

-0 “ " W./«)r“‘ / x7'"''7(x) dj* 


(^(fi/n))'"* / x*‘’‘''7Cx)dx 


In a similar way it is seen 


(4b) I « 

dlt i(i_« n 

Here c denotes the population variance of the parent dwtributitm, Sntwtj- 
tuting (4a) and (4b) in the relation (3*) and writing 4 *> 4i/n one hB« 

(5) 4^) / x‘e*‘*fix)dx — *e*'7(*)d*J «> 

Considering the definition (1) of the characteristic function it is seen that 


^ j fix) dx. 



CHARACTEHIZATION OP NORMAL DiaTRIBDTION 


93 


Thi« on thp left aide of relation (5) are of this form. So one may write 

the relation exprefi,«ing .statistical independence of the sample mean and the 
.sample variance as a diffortmtial equation for the characteristic function 
namfly 

The initial conditions to Ije satisfied are 

(7a) 1^(0) « 1, ^'(0) = 1 ) 1 , 

where M is the population mean of the parent distribution. Integrating this 
equation it is seen that the characteristic function is 

(8) m - 

which h the characmristic function of the normal distribution. 


3, This reasoning applies also to the multivariate case. Let/(a:i ,Xz, ,x,) 
be the density of the p variates , is, • * • , . Denote by lu (k = 1, 

2, ■ • ■ , p; a » 1, 2, ' • ■ , n) the a-th observation on the k-th. variate, by 
the sample mean of this variate and by sin the sample covariance between the 
it-tli and I'th variate.s. Aasuming that the distribution of s*; is independent of 
the joint distribution of the p sample means (Ji, fs, • • , Sp) one obtains the 
equation 


( 9 ) 


Mm ^ ^ 

- rj- *=■ -crim 


Here (fim ia the iwpulation covariance of the variates ii and , 


f ® ‘ fp) « I • • • / ■'',Xp)dxi--' dx„, 


denotes the characteristic function of the parent distribution and 


ii 


dfi’ 




9*^ 

dtidL' 


If (9) holds for f, m « 1, 2, • • • , p one has a system of partial differential equa¬ 
tions which leads to the characteristic function of the multivariate normal 
distribution, 



9i 


CAm.-{t« K, Diinn.MitJ 


NOTE ON A IVIETHOD OF SAMPLING 
Ih’ (’ABmB K. Dtunjxvsr 
Nalmml f'lmrrnty of LilxrnK Jrg.nfnin 

Olds' haK considfwd the f<iUn\viiifj; pntl'h'riK (ttrrn n ioi nf mr m • n * r 
containing s items of a siitcified kituf. Items /Imtni vnthfuf rephif-i-mt-iu wUil 
j of the s ilems have, been dratm. The pToSthrm w to deiermtm fh.f prrlahihly hw 
of n, (he number of draivings which /kiit l(> tw madr Itt <hf pn"M uf noic. »i' •iha.li 
consider a certain limiting forni for t)ie pniimitilify fnnrsiot; nf n and make 
some remarks concerning re[X‘at«l campling «f tlii*- tyjx*. 

If n is the size of a drawing j :< n < r 4- j its prtdvjdnlity law P n' »>• given W: 

Pin) = f I ™ 

n r(j)r(s - j + 1) ds 

The characteristic function of n ia 





r(s 4- I) 
r(j-)r(s ^ j + “i) 



■” /r'(j 


X 4 xr*)'(/x. 


Differentiating we find 


( 1 ) 

and hence 


<p'it, n) 
>p{t, n) 


» J 4- re' 


f xKi - xr'd - X + xf*r \ix 

Ji ___ 

[ x^'d - xr^Cl ~ X 4- /rTrfx 


Wi(n) = ^ P(n)n = »)),-<> 

n-j 9 4“ 1 

For the calculation of moments about tlic mean we lake 
(2) v>{i, n “ mO = c''”‘V((, n), 

from which we obtain 


n — P(«.)(ft — mi)* « wCft). 

In particular, m = J__ jl) _ -phe valiiw of wtiCn) and n%{n) have 

already been given by Olds using another method. Putting - m we have 

8 d' 1 


*E. G Olds, AnnaU of Math. 8tal., Vol. 11 (1£M0), p. 3M. 



SAMPLING 


95 






Ml - 0) + 


/ O', 2). 

(s -1-1, 2) 


3(1 - ffU + 0(2 - 30 -f- M) + (7:^^) 


^ « (0 - - (11 + 4/3 -f GM)mi - i3(6 - 11|3 + 6^ + 0‘) + ■ 


fkl 


r(r ~ 1) ■■• (r - k + 1 ), (j, k) = j(j -^-1) ■■■ (j + k ~ 1 ).‘ 


wIkw r 

\\"i' can olitain a limiting form for P(7i) in the following way: 
Siticc 


f('.v)O'”) 


wc fiml 


V* 

Thi‘r(‘f()ft’ 

(3) 

wIutc 


(, A => ±iL - i)"^(l - J -H xe^'Jdx. 

V r J r(j)r(s - j -1-1) Jo 


lim V (^t, ~ * jf d*, 


r(« + 1)_- x)‘~^. 


L(x) yQt'Is - J + 1 ) 


n - J, 


; P(n)> has as its 


1 “ 
1 


The interpretation of (3) is that the distribution 
limiting form the distrilnition {z, P(2:)| as r —► 

Ix*tting u,, fh, ■■■ be a sample of size w and fl the mean, ^ = “ 2-”. ■ 

For the characteristic function of fl we have 

di, «> “ £, ft ''(m).-'"- - g«(s .»)=[«'(s ■ "•)] 

and hence 

€(h£ - 

method of calculation of. C. Dieulefait, Con^pUs Bendu, Vol. 

208, p. 146. 





96 


RAHtOh 


For f = 0 we have miC;!) «= Wj)?. Hut; 


rf“ At, A) i 

. 33^ ^ 

fit sr(t, fi) »■“ 


Then for i = 0 we arrive at 


For a = 1, wc have 


/*»<■ 


if ft) 


d" tf';'?. Jij 

.'/i" v;(£. 


»r“ 






tc 


and this leads us to 

t'ly'im + l)f« + I j) 
w(s + i)^(« h 2 ) 

By the Tchebychcff theorem we obtain 

I 




P(1 ft - mi(n)! < fffs) > 1 


F’ 


We can take I and w as large as wc pleaw; then wc have the folhiwjUK >*liK’hii44tic 
limit 


Urn ft »■ mi(n). 


Now, we have 


and 




^UO ^ _j_ 

*w(f) irx <r» 1 


Remembering (1) we readily obtain 
,, f- 4. (/„- l)iO' jLl) 4. i 1 a. 

^ ‘rlwl (s + 1)^ ^ (g + lT(a + 2) «~+ ij ^ 

^ VBteS + 1 ^ 


t + 


Thus, we find 


1 + i • 
«*««»# + 1 


lim , 



COHttP;i.A.TION DUE TO COMMON ELEMENTS 


97 


This rf.Milt implies that the rlistribution ^; p(n)l has the limiting 


nonna! flistrihutifin 




era 


as w 


A SEQUENCE OF DISCRETE VARIABLES EXHIBITING CORRELATION 
DUE TO COMMON ELEMENTS 

By Carl H. Fischer 

f 'niversity of AHchigan 

1. Introduction. Studies of correlation due to common elements have been 
made mun' nr le-ns siKirudieally over the past thirty years in attempts to throw 
more iiglil on tlu' meaning of correlation. Numerou.s example.s may be cited. 
One of ihe enrlie.sl wsih a .study by Knpteyn [1] in which he showed that two 
.Simla, I'a/di of n elernentH drawn from a normal population with k elements in 
t’oiinnon, had a coiTtdation coefficient of kjn. This was considerably generalized 
by the writer (.'Ij who considered aums of different numbers of elements drawn 
from tpiite urliitrary continuous dlalributions. The work was extended to in- 
chicle H>t|ueri(’cs of three or more such sums. Antedating thi.s latter paper, 
Rietz (2i has devised various urn schemata in one, of which pairs of drawings of & 
halls each were produced w’ith I halls hold in common. The coefficient of 
correlation ladween tlio numlwrs of white ball.s in each of the pains of drawings 
was found to Im f/s. 

Fairly recently some interest has been shown in thi.s .subject in connection 
with the study of heredity; hence it appeared that it might be of value to present 
tlie following stud}" by elementary methods of a sequence of discrete vaiiable.s 
in which each momher is linked to the adjacent members by various specified 
numbers of eommon elements. 

2. Two variables. A pair of discrete variables is defined tus follows’ The 
first, j, is equal to the number of white ball.s in a set of Si balls drawn one at a 
time from an urn which is so maintained that the probability of drawing a white 
ball is always a constant, p. The second, y, is equal to the number of white 
balls in a «*coml set of balls formed bj- drawing fij balls at random from the, si 
halls of the first set plus Sa — tn balls drawn directly from the urn. The numbers 
Si and Si! may or may not. be equal. 

Kvidently the marginal distribution of a: follows the Bernoulli law and is given 

^^'1’ in finding Fix, y'.tn), the bivariate distribution 

* By is meant the number of combinations of a items taken h ntntime. Itshallbe 
understood that — 0 if 6 < 0 or 6 > a. 



98 


CARt H. nWHKB 


function of x and |/ with tn halls in rommon fwUww'n th*- twf* flrawinj''*, !.•< to 
write the product of the thn^e pnihafiilitiw: of ohtaimng r whit** in the 
first set; of drawinR rf of thw* whiU-a in the fij balls rhftH'n fit ratidoin from this 
set; of drawing exactly t/ d white halla among the «i t„ t»all« drawn rlinr^pfly 
from the urn to complete the weond wt. This prf«hu't may readily i#e rt'riueed 
to the form shown below in (1), symmetric in x and y and in «{ antJ % , whieli 
is then summed on d from 0 to h . Thus 


(1) 


The marginal distrilmtion of x has alnjtwly lxs*n given. Kn«n tlie ^ymmelry 
of (1) it is obvious that the corresjsinding marginal distribittitm of y must 1 h* 


characterized by the Bernoulli distribution function 






The varianres 


of the marginal distributions are sipq and Sjpy, res{K-rtively. 

We next proceed to demonstrate that Ixith of the regn‘'S‘'ion curves are linear 
and to find the equations of the lines. Consider an array of x on y for .some 
fixed value of y. The mean of the array is 


( 2 ) 


f. 



xP(x, y:<«). 


The summation in the right member of (2) may lx* exjianrhsl and then re¬ 
written as 






The inner summation in (3) ia seen to equal d + p(sj «»,) and hence (2) 
becomes 




Then the equation of the line of regression of x on y becomes 
(4) « <ijy/s» + p(4i — /i»). 


By symmetry, the line of regression of y on x may Im seen to Ix' 


pz « Wsi + p(Si - /»). 

The square of the correlation coefficient ia equal to the product of the akipt'S 
of the two regression lines, hence 

J'lv ™ W(«lS<i)*. 

If Si = Si = s we have the familiar result i/s. 



rOBRKIATIO.N' DUB TO COMMON ELEMENTS 


99 


3, Three Tariables. A third variable, z, may now be defined as the number 
of white fiall.H in a M-t of sj balls formed by drawing balls at random from the 
ss of the svroml wt pltiH .i» ~ f« drawn directly from the urn. It is evident from 
the nsmlts, mi two variables that the marginal distribution of z foIlo\v.s the 
bernoiilli Ian and that the eijuationK rtf the rpgre.s.sion liue.s of z on y and y 
on z an" 


<ay/«a + P(«ii " <sa); 


Ih ~ twz/sj ■+• p{sj ~ tu). 

't'he eurn'latictii ecM'ffirient, jv. , i.s eiiual to /^/(sjsa)*. 

The reliitionshiii Iwfween x and z remaims to 1 m.> inve.stigated. The proba- 
hllity of the joint oeeurrenee of x white.s on the first drawing and z whites on 
the thirtl when it is Kperihed that the s, and sa balls of the two sets shall include 
the same y balls in eoinnton is given by the right mtmiber of (1) with g, z, and sa 
re|)laeiiig fis, </, amt Ss, respectively. When this Gxpre.s.sion is multiplied by 
the pnihahility tliat the first imtl third sets do contain exactly g balls in common 
and the prorluet is summed on g over the range 0 to in , we have P(x, zUn , iu), 
the liivariatr* tlistribulion function of x and z. Thus 


(fi) 




The mean of tlie array of x and z for any fixed z may bo written, after inverting 
tlie order of summatitm; 


(7) £, 




The. expression within the square brackets of (7) is identical m form with 
right member of (2), and hence, we now have 


{l«*/a + P(n - - f)(2) 


the 


This reduces readily to 

( 8 ) 


Inin _ , SiSt ~ in in „ 

_, 2 q. ---- p. 

St Si S3 


By symmcitry, 


Initt ^ _j_ _ 

Si St 8» 


Intu 


The coefficient of correlation between x and 2 is found to bo 


( 2 ) 



r*» 


fu in 

5j(8i«s)‘' 



100 


CAM. H. rtM'nm 


It will Ije ohnerv-Ml fhat 


( 10 ) • 

Interesting r(“Uiti»ti‘'hii>H, alftn exi^t ain«nK tSf iKiT'Vfi aiei ifHjTii'Satwtn 

coeffieienta and the reKr**»'>'i‘it» ••«irfrs''*‘ I* ''‘dl U' ^fir.‘'»en?« r»^ Ker** Ui 

measure eaeh variate frmii it- mean and »«* re|»in«e di, • hI«>» ruif* . r. v- anil s:. 
on r by 1, 2, and 3, resin'etively. Then Ue* mubsjdi r«'Kf< !ss<m -Tie r t.f ,.ar)i 
variable on the other two niav Iw’ r(ij,\enientiy e\jir< '-ed »»>4 rie > >4zrttfr> 
of the t'orrelafion deterraituint From the re-nit- found lu the v. tj^r 'F; for 
the ease where eaeh element r,, of the etirrelalnm deternanatF uonv la* exjtr?- *rd 


the product ri,,+rr,+i .^a * 

•• 1 , 

. we now* huv*. 


ff„ « 1 ~ 

r?s, 

lin " '"r-sT 

rw", 

Rn ~ 1 -• 

3 

ru , 



Rn - 1 ■“ 

5 

ri3. 

H\x '■ ft 


Then the regression planes 

of r on 

If s4nd z and of 

r on x and w .tr*' given 


respectively, by 


I 


fl'i 


S’/ 


bs 

■»? 


It. 


TtiOi in 

2 « {f f/. 
«s 


The regression plane of y on x and s is 

ffisd — rj|) 


1' — rijTM 






X -F 


r^n * rJjJ 


oa 


(sjSj ^ ^ (*iSj “■ ajfjjlty 

S)*j8i “ tiifsi ^t)***) fntii 


The three multiple correlation coefficients art* 


(11) J*! aa = Til, ra.is ^ Tn, ri » »= 

The partial correlation coefficients arc 

(12) ria.a^ruF--i « 

LI — riaTja^ 


1 — (1 — r?^ifl rist 


1 


1 t 

rn 


' \ ^ J It 

I ’ a ? » 

.1 


fis f ft 


4. k variables. A sequence of k variables may in' formetl ^u^*ef•}«^\‘«’’lv' an werr* 
the three considered above. It will be. convenient here tri dwsgnafe the vansblew 
by xi (f = 1, 2, ... k). We also define hi os the total mmilKT <if balls heki in 
common between the first and the i-th dr&wing?i. Then, as atwH-ial eases, 
111 = Si and /ij = <M , 



f’DIlRKLATION Dt;E TO COMMON ELEMENTS 


101 


The* liivariatr (li‘*tril)ution funotions, regression lines, and correlation coeffi¬ 
cients assof-iated with any two consecutive variables in the sequence and with 
any two variables nqiaruted by only one other variable can, from the preceding 
results, la* written at once. 

It IS Hi if difficult to derive the bivariate distribution function for Xi and x* 
liy an (‘.xtenrion of the rnedhod uw'd in deriving (C). We, then have 


l*Ui , Xk‘.(n, (a - b-u) 

The f'tjuaticin of the line of regression of Xi on Xk is 


Xi “ X) xiPixitZkiiii, In 4-1,0- 

* 1-0 

Tliis may Is* rwlueed, hy rejieated applications of the steps illustrated in the 
eorre.-iMintling cm' for three variable, to the form 


(H) 


Xi 


3*3il 


fis4a ■ 


4-U- Si$t ' • ‘ S|;-1 — tiitg • • • 4-1.> . 

•Si 8l S3 • < • Si- 


Hy sytmnetry, we have 


X* ™ 


fisfu ‘ ■ ’ 4~l.i , SjSj • • • Si 48ts3 4-1.1 ■„ 

Xl ---- p. 

SiSa • • • St-i Si 3} ' • • S*-! 


'rh(‘n the simple correlation coefficient between xi and xi is 


(15) 


(nin ' ‘ • 4-i.t 

SjSs • • • 8fc-i(8iS0‘'^ 


ris'TM ■ ■ ■ ri-l.l: • 


It was shown by the writer [4] that for a sequeAce such as we are considering 
the multiple correlation coeffieietit is a function only of the variables immedi¬ 
ately wljacent to the one considered, and that the partial correlation coefficient 
is zero for any pairs except those of consecutive variables in the sequence. Thus, 
the formulas given in ternw of simple correlation coefficients for the case of a 
seqiK'nee of three variable.s may Ik: interpreted so as to cover the case for k 
variable. 


UKFEIIENCES 

(I) J. C’, KArrBVN, "Dcfimtion of the corrolfttion-coefRoient," Monthly Notices Roy. Asiron 
.Soc., Vol. 72(1012), pp. 618-526. , . v. » 

(2| H L. ItiiiTzi, “Urn schemata aa a basis for the development of correlation theory, 
Anmtf# of Math. Vol. 21(1920), pp. 306-322. 

13) C. H PtscHER, "On correlation surfaces of sums with a certain number of random 
elements in common," Annals of Math, Slat. Vol. 4(1933), pp. 103-126. 

(41 C. H. FiacHEB, "On multiple and partial correlation coefficients of a certain sequence 
of sums," Annals of Math. Slat. Vol. 4(1933), pp. 278-284. 



REPORT OF THE NEW YORK MEETING OF THE rRSTrnjTE 

The Seventh Annual MwtinR thr* f4 nJ vm 

held from Saturday to Tnwlay, l>(wnlt^r 2T I't-U, tn with 

the meetingH of the Allied Srwial Srienri* With th*- rxi-i’isium «f 

the semon on Tuwlhy aftenionn, all eosaiftni* w/'fp t><']d a? th<> BillJiiurr ||>4<d 
The following one hundml wveuty-M'vr'n mi’mlwr** »if th^ atlmd***! 

the mwting; 

P. L. AH, H, E. Arnold, K. J. Arm.W. h, A AMaii, K J ltrn'«.R W | L 

Battin, B. Bcnnolt, Carl Bemtoti. Jriacph Jk-r3s>ii««, IVin 1 I t' 

I, BliM,A. J.Bniiia,PaulHtweban,A, H IktwlH'r,!} H HT&tly, A h Hranli.tl H 

II. W Burgesa, J. H. Bualicy, Belle (’nldpron, B H t'antf*, 4 M W ti r*rJirait, 

A.C C'ohen,Jr.,M. 8. Ctolien, IwdnreColiii, 4 H tVHatan.L M <“■«»!.t» H <t r-.w^n, 
Gertrude Cox, G. C. CVaig, B. B. Day, I). B IW.ury, W Ikinsr*^, W 4 H 1 

Dodge, H. F. Dom, Paul Ilnrweiler, David Durand. 4, H DiHln. F S Dwxei. t'itwrbtU 
Eiaenliarl, W. F. Ellciti, 4. 8. Elalon, M. L. Klvebark. I> It HnH'dy, B l» !■ vaop,. Wjijv 
Feller, J. W. Ferlig, Irving Fisher, \\\ C. Haheriv. M M FFt-d, U M I. H 

Frankel, 11. A. Freeman, Ci. It, Online, Hihlw liWnnger, (* H Slraves, 4 \ UrernwiriTl, 

4,1. Griffin, G. C Grove, F. E. tlmblw, K. 4. t.luwlvl, M 4 ISagi-JiUt 4 IGndjM H 
Hansen, Myron lleidingafield, tklwardltidlyiO M H«}s(»er, ltir*»14 H<4ell(rig t A H«y* 
WilliamHurwiU, Seymour JnhUrn.W.W. Jaenlni, ItMehrl Jrow, A!vr«*t Kan'nfMUl*, Kssrl 
Karslon,LooKaU,C.J.Kicrnan,B,F. Kimball, A 4 King, 1. F Knudiwn. H S K<nHjn. 
TiallmgKoopmanB,R. L. Kosclka, A.K.Kurls, A K Kury.S M Kvirrel, Jaek l Kderman. 
Oscar Lange, D. ILLoavena, B. A. Lengyel, llowattnrf'Vene, Ma I,rvjn, M 4 tuBis. Irving 
Lorgo,A.J,Lotka,EugonoLuka<j«,a.A.LunillH*rg,l*.4.MrrarJhy,W U M>td«w,Ik'njft«Hri 
Mnlzborg, Henry Mann, Jakob Marsdiak, 4. W. Mtuueblj', {,». F, T Maver. Margarvl Merrvll, 
J. N,Miobie,J.It.Miner,NatbanMorriaon, J.E.M<»ri««,F C MosikUer, ^1 U SmWd, 
HaroldNiB8el8on,G,E,Nlver,M L. Noninn, Kilan Norris, 4. I Xnrtham, t' Ct Oaklev, 
E. G. Olds, P. S, Olmstcad, J. G. Osborne, R. F. Paawino, Edwanl P«nlit*»n. r" K Payne, 
ViotorPerlo, J.M. Porotti, L,M, Petit, Q, A,D. Preinnueh, Harry Pmi9i, LIriv Ratkirwitjt, 
L, J. Reod,?. V. Reno, J, S. Ripandelli, Selby Hobinwm, H G. A f* Ibiwiawler, 

Ernest Rubin, H A. Ruger, P, A. Samuelson, M. M. Sandomire, Max Sarndy, F E Saltvr- 
thwaito, Henry Soheffo, H. L, Schug, H. A, ^oriat, Nathan Seidert, W. A Sbvlton, H. %L 
Shephard, W. A Showhart.'U. M. Shulman, Harry Siller, R. R. Singleiinn, L E, Smart, 4. 
H.Smith,G.W.Snedooor,EmmaSpaney,MorUraer8plegelrafln,Arth«rSi«'tn,M B.Kitevena, 
4, S. Stock, M. M. Torrey, M. N, Torrey, W. R. Von V»rhi», D. F. Vntaw. 4r , W. V,. 
Waite, H, M. Walker, W. A. Wallis, A, N. Wataon, E. W, Wllann, C". P, WUnwr, Jawb 
Wolfowitz, M. A Woodbury, W, J, Youdon, Joseph Zubin, 

The opening session on Saturday afternoon on The little o/ o/ Nifnttonw 

in Biological Research was held jointly tvith the Biometric Sivlion of the 
American Statistical Association. Professor K, B, Wilson of the Harvaiti 
School of Public Health acted os chairman. The wwon wm in the form of a 
round tabic discussion, the principal discussants being: W, Mward* Homing, 
Bureau of the Census; Harold Hotelling, Columbia Univemtty; I,owell J. RcfttJ, 
Johns Hopkins University; and George W. Snedecor, Iowa Slate College, 


* The list of attendance has been compiled from the regietotlott list auppUed by the 
Director of the New York Convention and Visitors Bureau. 


102 



RKFOKT OF NEW YORK MEETING 


103 


()n .Saturday (jvening, under the chairmanship of Dr. Walter A. Shewhart of 
Rr‘11 'rflephonf Laboratories, a ae,s.sion was held jointly with the Econometric 
Society on Thntry of Runs and Confulmce Intervals. The following program was 
pnwnted: 

1. Tht ihmrt/ nf mm in rnrw/om datn, 

ISnreW T. Davift, Xcirthwcatcrn University. 

2. Ftke lime mrifM siffnijienncc Icgti bated on siffnt of differences. 

('mtflrf'y H Mnom, IliitKeni University. 

\V. Allen Wallis, .Stanford University. 

3. Confrrmft! intervals far the unknovm median of any type of universe. 

John n. Kmith, ttniversity of Chicago. 

The morning and afternoon aassions on Sunday on Numerical Computational 
Dmets were held jointly with the American Statistical Association, with the co- 
Ofxjratitin of the Clommittee on Addresses in Applied Mathematics of the 
American Mathematical Society. Dr. C. R. Langmuir of the Carnegie Founda¬ 
tion for the Advancement of Teaching acted as chairman of the morning session 
on Rtalistical and‘Matrix Calculation. The following papers were presented: 

1. Home matrix method* in least Square and other multivariate problems. 

Harold Hotelling, CloUimbia Univeraity. 

'1. The Maltaek eteclrical cakulatiriff machine for solving simultaneous uneor eguationa. 

KHmbeth Monroe Bogga, Uornolt University. 

3. Mafhemalieat operation* vnlk pnneked cards. 

J. C!. McPherson, Internalional Businoee Maohinea Corporation. 

4, Recent developmenU in correlation technique. 

Paul 8. Dwyer, University of Michigan. 

The aubject of the afternoon session was Mechanical Solution of Differential 
Equations, Dr, R. M, Foster of the Bell Telephone Laboratories presided for 
the following program: 

1. PutKh card calculation of orbits. 

W. J. Eckort, Naval Observatory. 

2. Punch card melhadsfor solving linear differential equations of second order. 

Marlin SchwarzBohild, Columbia University. 

3. Differential analyzers. 

Harold L. IlaMU, Maaaachueott# Inatltute of Technology. 

DiseumanU: 

L. S, Dcderlek, Aberdeen Proving Ground. 

Norbert Wiener, Mosaaohusotts Inatltute of Technology. 

Professor Helen Walker of Columbia University held the chair at the Sunday 
evening session, a joint session with the American Statistical i^sooiaUon. T e 
following program was given under the title: On Some Technical Aspects of 

Sampling, 

1. On the relative efficiencies of vanous areal sampling units in population inqutnes. 

M, H, Hansen, Bureau of the Ceneue. 

William Hurwita, Bureau of the Cenaua. 



104 


HEPOBT OF NKW YOUK MEETINO 


2. On the monthly sample sunry of nwmplnytnent. 
h. R, Rrankel, Work Projects Administratifin. 

J, S, Stock, Work Projects Aelminislrfttioti. 

3. On certain biases in surreys by f/mslirinna:re 
J. Cornfield, Buresu of Labor Staliaticfi 

4. On the relation of probability tn sampling. 

W. G. Madow, Bureau of the C’enaua. 

6, Recent (iesctopnients in sampling /or agriruUumt etntsetiw'' 

G. W. Snedccor, Iowa State (kdieRe. 

A. J, King, Iowa State f’otlege. 

Dueussants: 

W. G, Cochran, Iowa State (‘ollege. 

J. A. Greenwood, Duke Cmvcraity, 

Another joint aesaion tvith the American Statistical Arwiation waa held nn 
Monday morning. The topic considered was; TFAnf Can (hr (Vnsm IJn Wtth 
Sampling? Professor L. Edwin Smart of Ohio SiaU* t'niverwity jiresided for the 
following program; 

1. An appraisal of the IBJfi sampling srheme. 

T. O. Yntcma and Dickson H. Lenvena, Cowles Comtniasimt for Hesenreli in 
Economics. 

2. Some requirements of sampling design and presmtation. 

W. Edwards Detning, Bureau of the CVnmiia 

3. Compromises, losses, and gains brought about by the, inlrmlurlton o/sampl»ng. 

L. E. Truesdoll, Bureau of the C-enaua. 

4 The proposed annitai sampta census, 

Philip M. Httuaar, Bureau of tho Conaua. 

Discussants; 

A. N Watson, Curtis Publishing Company. 

F. F. Stephan, Office of Production Management. 

8, A. Stouffer, University of Chicago. 

On Monday afternoon, a session waa held for the reatliiiR of cimtrihuted 
papers on ProbahilUy and Statislm. Professar Harold IlnU'HitiK actetl as 
chairman, and the following papers were rcaxl: 

1. Scanning data (o determine significance of difference between frequency of an ermUn 
contrasted groups. 

Joseph Zubin, New York State Paychiatric Institute. 

2. Compounding probabilities from independent significance tests. 

W Allen Wallis, Stanford University. 

3. A class of multivariate distributions. 

Walter Jacobs, Securities and Exchange Comniisaion, 

4. Definition of the probable error. 

E. J. Gumbel, New School for Social Research. 

6. A generalized analysis of van’once. 

F. E. Satterthwaito, Univeraity of Iowa. 

6. On the power function of the analysis of eon’once lest. 

Abramham Wald, Columbia University. 

/JSonf « 9 w«h*(m 8 by hyperbolic and circular 

E. E Blanche, Michigan State College. 



EEPORT OP NEW YORK MEETING 


105 


8 Additive partition functions. 

J. Wolfowitz, Columbia Unlvei-Bity 

9, Limited type of probability distribution applied, to flood flows {Preliminary report) 

B. F. Kimball, New York State Public Service Commission 
Abstracts of these papers follow this report. 

Professor Harold Ilotolling acted as chairman for the session on Tuesday 
morning, held jointly with the Econometric Society and the American Statistical 
Association, The program consisted of invited addresses on Recent Advances in 
Maihemaiical StalisUcs by Professors Burton H Camp of Wesleyan University 
and Cecil C. Craig of the University of Michigan. 

The session on Tuesday afternoon was held at The Boyce Thomp.son Institute, 
Yonkers, New York. It was a joint session with the Biometrics Section of the 
American Statistical Association on The Design of Experiments. Dr. W. J. 
Youden of The Boyce Thompson Institute acted as chairman and had various 
experimental designs on display in the greenhouse. Through the courtesy of 
members of the Institute stall, transportation between the railroad station and 
the Institute was provided. After the program, lea was served. The following 
papers were read: 

1. Biological interpretation of interactions. 

W C. Jacobs, Cornell Univcraity 

2 Adapting the design to the experiment 

Gertrude M. Cox, North Carolina State College. 

3. Sampling theory when the sampling units are of unequal size 
W. G. Cochran, Iowa State College. 

4. Sampling errors of systematic and random surveys of cover type areas. 

J. G. Osborne, U S. Forest Service. 

A luncheon meeting Monday noon was held jointly with tlic Econometric 
Society and was attended by ninety-four persons. Professor W. C. Mitchell 
of Columbia University presided and called on Irving Fisher, Harold Hotelling, 
W. G. Cochian, and W. A. Wallis for brief remarks 

The annual business meeting of the Institute was held late Monday after¬ 
noon, with President Hotelling presiding. 

The report of the Secretary-Treasurer was read. The report appears on 
pp. 107-109, 

President Hotelling stated that Mr. George W. Petrie, HI, had audited the 
books and records of the Treasurer and found them to be in agreement with 
the lleport presented. 

Dr Madow, who acted as teller, reported that the mail balloting had resulted 
in the election of the following officers for 1942r 

President' Professor C. C Craig 

Vice-Presidenis: Professor A. T Croig 
Mr. E. C Mohna 

Secretary-Treasurer: Professor E G Olds 



100 


REPORT OF DAU.AH MEETINO 


After dacussiiig varicniH ways of Itroadc'iiitvg tho fiorvun* of the Insfituto, a 
motion wa'S carried which rrenmnicndcri tliat flic Board of Dircctorrt appoint 
committees to study the followinK matlcrs: junior mfmlwr.diip.w, local rhapBirs, 
and advertising for the, official journal. Uitcr flu* Board itpjirovcd tliw recom¬ 
mendation and commitU'es were appoinfed. 

KnwiN 1*. Oww, 

SarrrUiry 


REPORT OF THE DALLAS MEETING OF THE rNSTlTUTE 

The twelfth meeting of the Institute was held jointly with the meeting of 
Section A of the American Association for the Advancement of Science, and of 
the Econometric Society in Dallas on December 2d 30, 1941. Proftwor 
Dunham Jackson, Secretary of Section A of the A. A. A. S., has kindly jM-nt 
the following information regarding the meeting: 

Sessions of the joint meeting of the Institute of Mathematical Statistics 
with the Econometric Society and Section A of the A. A. A. S. were liehl Monday 
afternoon, December 29, and Tuesday morning and aflenuwn, Dwemlier 30, 
at Southern Methodist University. The, nnmlrer of contrilniUHl papers olTered 
on Tuesday was such as to cause extension of the session into the afterniKin, 

On Monday afternoon addreasea were de.livere<l, in accordanct* with the 
programs issued in advance, by Profeasor A. B. Coble of the rniverHity of 
Illinois, retiring Vice President for the Section, on A Crrtam Snl of Tm PoinU 
in Space, and Professor S, S. Wilks of Princeton University on Hepresenta^ 
live Sampling. 

The order of papers on Tuesday was as follows: 

1. On the theory of the lelrahedron. 

N. A. Court, Univereity of Oklahoma. 

2. A method for integrating the Umar hyperbolic cgualion in three ituiependml eariablm. 
E. W. Titt, Univoreity of Toxaa, 

3. On poicara of a rnolri* loAose elemenle are hU of poinle, 
a. T, aanders, Jr,, SouthwcBtcrn Louisiana InaUtuto. 

4. Analytic theory of parametric linear partial differenlial equatiom. 

W, J, Trjiteinaky, University of llUnois. 

5. The theory of the Riesz integral. 

H, J, EttUnger, Univereity of Texas. 

6. Obtaining differences from tables tehich are in Ihe form of punched cards, 

Harry Pelle Hnrtkomeier, University of Missouri. 

7. On tnuesdmonf and the safuodon of capifol. 

Montgomery D. Anderson, Uuivorslty of Florida. 

8. Advantages of singling oul dsgrsss of frosdom in analyses of variances. 

W, D. Baton, Michigan State College. 

0, The incidence of an income lax on saving, 

Abram Bergson, University of Texas. 

10. Certain tests for randomness applied to data grouped in small sets. 

Edward L. Dodd, University of Texas. 



HEPORT OP SECBETAHY-TBEAaTJRER 


107 


11 . Slraiified sampling (Preliminary Report). 

A. M, Mood, Univeraity of Texas. 

12. On convergence factors in convergent integrals. 

Charles N, Moore, Univeraity of Cincinnati. 

13. Geometric statement of a fundamental theorem for four-dimensional orthographic 
axonometry. 

W. H. Roever, Washington University. 

14. A certain non-metne Moore space. 

¥. B. Jones, University of Texas. 

Abstracts of papers 8,10, and 11 follow this report. 

Papers 1 to 8 inclusive on this list were presented Tuesday morning, and 
papers 9 to 14 at the afternoon session. In the absence of the authors, papers 
10 and 12 -were read by title. 

The presiding officer Monday afternoon was Professor G. T. Whybum of 
the University of Virginia, Chairman of the Section and Vice President of the 
A. A. A. S. On Tuesday Professor H. J. Ettlinger of the University of Texas 
presided for papers 1 to 4 inclusive, and Professor S. S. Wilks of Princeton 
University for the rest of the program. 

Edwin G. Olds, 

Secretary 


ANNUAL REPORT OF THE SECRETARY-TREASURER OF THE 

INSTITUTE 

On September 2-4, the Institute met at the University of Chicago, in conjunc¬ 
tion with meetings of the American Mathematical Society, Mathematical 
Association of America, and Econometric Society. Sixty-eight members of the 
Institute attended the meeting. 

As mentioned in the 1940 report of the Secretary, the Institute became 
affiliated with the American Association for the Advancement of Science at the 
close of 1940. President Hotellmg appointed Professor Truman L. Kelley as the 
representative of the Institute on the Executive Council of the A.A.A.S. for 1941. 

On December 29-30,1941, the Institute held two joint sessions with Section A 
of the A.A.A.S. and the Econometric Society in connection with the Annual 
Meeting of the A.A.A.S. at Dallas, Texas. Professor Wilks gave an address 
at one of the sessions. The report of the Seventh Annual Meeting of the Insti¬ 
tute appears on pp. 102-106. 

The Institute was invited to send an official representative to the Academic 
Festival of the University of Chicago, September 27-29, 1941. Mr. John F. 
Kenney was appointed as the representative of the Institute. 

During the past year, the Secretary has received a number of inquiries from 
members regarding opportunities for doing statistical work in business, govern¬ 
ment, and industry. While the Institute has no particular organization for 
such service, the Secretary will be glad to supply information regarding positions 
which come to his attention. 



108 


RKPOUT OK RKCURT.MtY-Tin;ASOUKK 


The Institute luis printril an official ulMtracl hlaiik to !«■ fiuhniitfinK 

abstracts for contributed paiicrs. A supply of thc.sf lilarikH ran te> tdttainfr! by 
writing to the. Hi’crctary. 

The deaths of two of the mcmbt>ra of the Irmtitutc have fwen reteuied ‘•ince 
the last Annual fMceting: Piofcssor .Tames W. (Hover, I'niviT-ity of Michigan, 
and Mr. M th iNIacLean, Dominion Ilureau of Ktalistics, Dttaw.a. 

The following financial statement eovers the perifsl from .Taiuiary 1, I'dll to 
December 10, 1941: 


uKCKipr.s 

Balance, Jnnuary 1, 1941. ,, , |,i 27 47 

Rockbfellek Foundatiok CtiiAST . . l.Wtd tK) 

Dues . . , cpigl >,0 

SuaacRiPTJoNs ... .... l.l.tH 67 

Sales op back NUAfBERs . Kllj 20 

Misoellaneows . , ite.ua 


Total Receipta , 


Annals Office 

Editorial ExponscB 1940 
1941 

Printing. 


EXPENDITUUKS 


Wavbbly PnEaa 

Printing and Mailing Annals—4 issuCB 
Back numbers Office 

Postage and maiiing 19-10 . 

1941. 

Insurauco . . 

PurohasQ of back numhora from H. 0. Carver . 
Reprinting 200 copies of Vol V, No. 3. 

Membership Committee . . 

SECRETARt-TREASUBER’a OFFICE 

Piling Case. . 

Printing and Supplies . . 

Postage, telegram, and express. 

Clerical Help . 

Printing Proobams fob Meutinos. 

Miscellaneous . 

Total Expenditures . 

Balance on hand, December 10, 1941." 



$5,840 2§ 

In comparison with the financial condition of the Institute at the end of 1940 , 
e leceipts from dues, subscriptions, and sales of back numbers have increased 












REPOET OF SECRETAIIY-THEASURER 


109 


nearly two thousand dollars. This is largely due to a net increase of 171 mem¬ 
bers and 20 subscriptions. Early in the year the Institute received the last 
thousand dollars of its grant from the Rockefeller Foundation. This source of 
income has materially assisted the Institute in surviving a period of financial 
unceitainty. Its IoslS will be severely felt. 

The exiienditures of the Institute show a slight decrease, partly due to the 
fact that fewer back issues of the Anmls had to be reprinted. An unnecessarily 
large item of e.\pense is that of the postage which has to be paid because of the 
slowness of some members and subscribers in paying dues ?md reporting changes 
of address, Many copies of the Annals have to be reclaimed and mailed a 
second time. Members could save the Institute considerable expense if they 
would pay their dues promptly and report change of address well in advance of 
publication dates of the Annals 

Financial prospects for 1942 are mixed. The importance of the statistical 
approach to problems of national defense has caused increased interest in mathe¬ 
matical statistics with the result that many people employed in government 
service or industry are applying for membership and urging their libraries to 
sub,scribe to the Annals. On the other hand, delivery to, and collection from, 
foieign libraiies is becoming increasingly difficult, and a marked decrease in 
the number of foreign subscriptions can be anticipated. Furthermore, operating 
expenses of the Institute are almost certain to increase as material and labor 
costs advance. On the whole, it seems very probable that it will require the 
full co-operation of all the members to avoid operation at a loss during the next 
calendar year 

Edwin G. Olds, 
Semlanj-Tremm. 

December'29,1941 



ABSTRACTS OF PAPERS 

I. Presented on December 27, 1941, at the New York MeH'ting of the Institute 

A Generalized Analysis of Variance. Fhankun K. Hattkhthwaits, ITni- 
versity of Iowa and Aetna Life. Insurance^ (lotnpany. 

Tliia paper oxarainea the fundameiitnl principals iinrlcrlying (Iwiftim for the nnalysia 
of variance. Given several statistics of the ty[)c, til » S,9f, where the <!'h arc arltilrary 
orthogonahzed linear ftfnetinna of certain underlying normal data, xt; a rule i« wt up for 
determining a set of mt na linear functions of the x* such that xr ” "'dl bo 

independent of the remaining xi's. Further it is shown that sinniltaneoiisly with the 
above, the x’s and the 0’s may bo subjected to certain tyiws uf linear restrictions (for the 
purpose of estimating parameters or othenvise) without disturbing the distriiiutions or 
the independence relations except for the appropriate. reduElioii in degrees of freedom. 
The rule used to determine the m'a gives results eonsiatent with the standard designs for 
the analysis of variance. However, it goes further in that one tuny use weighted rather 
than simple averages in setting up his design. A practical application of this is tlie two 
way analysis of data which are averages and lack homogenicty al variance through con¬ 
stants of proportionality between the variances arc known, Tlie two way analysis of 
incomplete data is another practical problem which is solved iiy the simple expedient 
of a zero weight. The use of weighted averages frequently inlroduc(‘« dillieiiltie# in esti¬ 
mating parameters, particularly the mean. The combination of the linear reatriclion 
concept with standard analysis of variance roclliods solves this difticulty. 

On the Power Function of the Analysis of Variance Test. Abkaham Wald, 
Columbia University, 

It is known that the power function of the analysis of variance test depends only on a 
single parameter, say X, where X is a certain function of the parameters invotviHi in the 
distribution of the sample observations. lAit Z bo any critical region (subset of the aampte 
space) whose size does not depend on unknown parameters, i.o., it has the same size for 
all values of the parameters which are compatible with the hypothesis to he tested. It is 
shown that for any positive c the average power (a certain weighted integral of the power 
function) of the region Z over the surface X »« c cannot exceed the power of the analysis 
of variance test on the surface X •» c (the power of the latter tes.t is constant on the surface 
X = c). P. 8 Hsu’s result, Biometrika, January, 1941, pp, 62-<}8, follows from this as a 
corollary. 

Definition of the Probable Error. E. J. Gombel, The New School for Social 
Research. 

The probable error is usually defined either os the semi-interquartile range or as | of 
the standard error, Wo define it as half of the smallest interval that has the probability J. 
For distributions whioh never increase (decrease), the beginning (end) of thia interval is 
the origin (the median), and the end Is the median (the end of the distribution). In general 
the probable errorp is the solution of the equations WCf + p) - IFIf ~ p) -> J and «i(f -(- p) 
“■ ti)(t — p) where ( denotes ths midpoint of the interval, For symmotriool distributions 
the flret definition remains valid. For the Gaussian distribution the second dofiniUon 
holds besides. The numerical values for the midpoint f and the probable error p are given 
for some distributions usual in statiaUoa. The caloulation of the standard error of the 
probable error, whioh depends upon the distribution u>(x), determines whether the probable 
error is more or less precise than the standard error. For the asymmetrical exponential 

no 



ABSTEACTS OF PAPERS 


111 


distribution tho mean and the median have the same precision, and the probable error is 
more precise than the standard error. For the first law of Laplace, ond for Gal ton’s re¬ 
duced distribution the median and the probable error are more precise than the mean 
and the standard error For Maxwell's distribution the mean and the probable error are 
more precise than the median and tho standard error. 

A Class of Multivariate Distributions. Walter Jacobs, Security and Ex¬ 
change Commission, Washington. 

The multivariate normal distribution has the property that its probability density is 
constant along the surface of a hypor-ellipeoid The class of distributions characterized 
by this property is considered. The form of the characteristic function of any distri¬ 
bution of the class is determined; in this way the parameters of the distribution are shown 
to be simply related to the first and second moments, when these exist. 

Every distribution of the class is the n-variate extension of a univariate symmetrical 
distribution Tho method of determining the form of the extension of such a univariate 
distribution is given. A number of properties of regression for the multivariate normal 
distribution are shown to hold for any distribution df tho class. Among other properties 
considered is the form of some sampling distributions. Some special cases of interest, 
including the extensions of the Cauchy distribution and the median law, are discussed 
briefly. 

Methods for Scanniug Data to Determine the Sigoificance of the Difference 
Between the Frequency of an Event in Contrasted Groups. Joseph Zubin, 
N. Y. S. Psychiatric Institute, New York. 

In many investigations in Psychology, Sociology, Economics and Public Health, there 
IB a need for a quick and ready method for scanning a mass of data in order to select the 
items that have a significant bearing on the problem under investigation. The statistical 
procedure for this item analysis consists essentially of evaluating the 2x2 tables which 
arise when two groups are contrasted for the presence and absence of a given character 
or event The chi square method or its equivalent, the ratio of the difference between 
per cents to its standard error, require considerable labor and time and several methods 
have been proposed for shortening the work. Recently a method was developed which 
eliminates the need for computing percentages or expected values, the analysis being made 
with the absolute frequencies. This method depends upon transforming p, the per cent, 
to the inverse sine function of Vl>' The method is applicable not only to 2 X 2 tables but 
can also be made applicable to 2 X n tables and r X n tables with the aid of simple formulae. 

Compounding Probabilities from Independent Significance Tests. W. Allen 
Wallis, Stanford University. 

For combining the probabilities obtained from N independent tests of signifloanoo into 
a single measure, tho product of tho N independent probabilities provides a criterion which, 
though rarely ideal, is usually satisfactory. The probability that such a product will be 
less than Q always exceeds <3, and is tho sum of tho first N terms in a Poisson series whose 
parameter is —log.Q; since this sum is also the probability that a value of x’ based on 21V 
degrees of freedom will exceed —2 log,Q, existing tables of x* may (as R. A. Fisher has 
pointed out in Statialical Methods for Research Workers, section 21.1) be used to teat the 
significance of a product of probabilities. If any of the probabilities have been derived 
from discontinuous distributions, as is likely with small samples of non-metric data, this 
method of calculating the probablity of the product fails; in such instances it invariably 
overstates the probability of the product. Formulas are given for various special cases 
arising frequently in practice and also for the general case o! D + C tests of which D are 



112 


ABBTItAsrrs OF PAI’KRS 


baaed on diacontinnous diatributiona and mi contmuoufi diatrilmtimm. In aevf'rnl il- 
luatrative exaniplca, the overatatement of the joint probability pon/fpqnent npnii neglect 
of diacontinmtiea ia of the order of 100 to 200 per rent. 

A Method of Computing the Roots of the General Cubic Equation with Real or 
Complex CoeflScients. Ernest E. Blanthe, Michigan Htafn College. 

The general cubic equation with real or complex cocfficimits may readily («< reduced tti 
the form y’ + Sf/y + 6 «■ 0. Suitable subatitutionB fur y in the reduced eriunlinu permit 
the use of the identities for hyperbolic functions and citcuiar fuiictiims: wii Sr, cum Sx, 
ainh 3*, cosh 3aj and Bin (« + i«). The following elaeeifieatioiui may lie eet up; (A i If tl < 0 
and H >0, only real root ih y “ 2\/)7flinh z whore ainh 3® w (}/2}l\/H «« If, ilMi If W <0, 
H < 0, 0/2H-\/—H g 1, three real roots, obtained by use of eirculur identity, crw 3x, (H-2) 
I! 0 < 0, H <0, Q/iU^—H > 1, only real root is y » 2\/ —// ei««h z when* rttpli Ik » 
0/2II\/~H, Complex roo_te arc —} j/j ± bi. The general cubic with etim|dex coeflieients 
haa solutions “ —2\/// sin (n + 2nr/3 + m) for n »a 0, 1, 2, where «iri i3» J- dui) ™ 
a + 6i «= M. For il/real, special cases aresimilAr to (A), (H-1) and (fl-'i). 


Limited Type of Probability Distribution Applied to Flood Flows (rVuliriiiriary 
Report). Bhadfohd F. Kimdali,, Port Wa.shinglt)n, N. Y. 

Relative to Gumbel’s recent paper on Flood FIohh (K ,1 .fliiinbel, ‘'The n fitni jierkid 
of flood flows," dnnflko/ Malhmalical f^lalislicn, \kd. 12 fIfMl)| I he author puinfsout (hat 
Gumbol's argument that the probability distribution of miixinuim valuea does nut atein 
from a limited form of primary probability distribution of the stream flow, in misleading 
(see page 177, loc. oit.), One might argue for a primary probability distnlnitioii of atreani 
flows of the typo: dV «• exp(—lu>)tfw where u >=» 1:(6 — logfa ~ x)|, 0 ^ x ^ where x is 
the measure of flow This increment of x is related In nurnml probability inerrment by 
the linear equation kdx m (a — x)du. This distribution will not satisfy the roiiditlun that 
von Mises uses in his argument concerning a finite distribution sinee (he eunnilative tlis- 
tribution 7 does not possess a positive derivative of finite order at x •» a Also, although x 
does not have infinite range, the transformed variate u has an infinite range to the right, 
and will satisfy von Mises’ argument for the derivation of the cumulative iliatrilmlion of 
the maxima, of the form exp[— exp|— «(« — ud) in terms of a. The anthur finds that 
such a distribution more accurately describes the behavior of maximum annual flood flows 
than one which ignores the existence of an upper limit a. 


Additive Partition Functions. J. Wolfowitz, New York City. 

Let Ui and n, be positive integers and let 


m = max 


/ ni n, \ 
\rii -|- nj ' fii -f- rhj 


Let the stoohastio variable 7 - (wt, a,, • ■ • vi) be any sequence of positive integers such that 
ei + e« + Si + ■ ■ • is equal to either one of ni and ni, while u, + w, + m + •. • « (xiual iti 
the other. Two sequences 7 with the same oloraonts arranged in different order are to Iw 
considered distinct and nil sequonoos 7 are to be assigned the same probability. Bueh 
sequences are of atatistioal importance (Wald and WoIfowiU, Annals of Math, filat., Vol, 11 
(1940) Let/(x) be a function defined for all positive integral values of x which fulfills 
the following oonditiona: , 

1. There exists a pair of positive integers, a and 6, such that that 


/U) 


a 



ABSTRACTS OF PAPERS 


113 


2. The aeriea 

2 |wi^‘ 

t~i 

ia convergent Then, aa ni and nj —* «, while ni/wi remains constant, the distribution of 
the stochastic variable 

F(7) « 

1-1 

approaches the normal distribution. When/(jc) « 1, F(V) ™ U(V) (loc. oit., Theorem I). 
When/(a:) -» log is a statistic introduced by the author (jlmer Math. Soc Bull. 

(1941), p, 216). 

A similar result holds for partitions of a single integer 

II. Presented on December 29, 1941, at the joint session of the Institute, 
The Econometric Society, and Section A of the A. A. A. S. 

Certain, Tests for Randomness Applied to Data Grouped into Small Sets. 

EnwARD L. Dodd, University of Texas. 

G. Udny Yule, in his paper A Teat of Tt-ppetl’s Random Sampling Numbera (,Roy, Slat, 
Soc. JouT.^ Vol. 101(1938), pp. 167-172), described tests applied to certain sums of the 
Tippett numbers. Yule regarded the Tippett numbers as not altogether satisfactory. 

The teats now to be described, however, involve no summation, For sets of three 
digits, four classes may be distinguished: The middle number may be the largest, or it 
may be the least; or the sequence may be monotone increasing or monotone deoroosing 
—here the sequence a, a, a, may bo classified with the monotone increasing sequences when 
o > 4; otherwise, with the monotone decreasing sequences Similarly, six consecutive 
digits m two sets of three digits each give rise to sixteen classes. On the basis of range, 
sets of two or more of the digits 0,1,2- • •, 9 may be separated into ten classes. 

Chi-square tests applied by the present author on the basis of the foregoing and similar 
classifications have not thus far indicated that the Tippett numbers are not satisfactorily 
random. 

Stratified Sampling. A. M. Mood, University of Texas. 

When certain relations between the probabilities pi, pj, ■ • , p* of a multinomial popula¬ 
tion are known in advance, the technique of stratified sampling provides more efficient 
estimates of the probabilities than does random sampling. Under certain conditions 
of stratified sampling, however, the maximum likelihood estimates, n,/n, of p, are biased 
but are unbiased m the limit as the sample size increases. The methods and results of the 
theory of maximum likelihood require no modification to be made applicable to the problem 
of estimation in stratified sampling; in fact the results of this theory imply the use of 
stratified sampling when the conditions for its use obtain. 

Advantages of Singling 0.ut Degrees of Freedom in Analyses of Variance. 

William Dowell Baten, Michigan Agriculture Experiment Station. 

This paper pertains to an experiment involving dummy plots for analyzing effects of 
placements and fertilizers for cannery peas. Three fertilizers were used' at different dis¬ 
tances from the pea seeds at planting, the design being a randomized block layout. Ad¬ 
vantages are given for breaking up the sum of squares, due to differences between “treat¬ 
ment” means, into sums of squares, each with one degree of freedom. Methods are given 
for securing the sum of squares involving dummy plots, and obtaining the variances due 
to mam effects and interaction. Interpretations are given for each phase of the analysis, 




THE ANNALS 

<!/■ 

MATHEMATICAL 

STATISTICS 


,/ 



(vomumD vtu.e. mMmO 

Tem OincatAii JcrtriurAj. os' •m& 

QV MAimSOCA'IlCAX* BrATiimoB 


Omtmts 

Th^ Piogeoy of «iA Shrliite Fopudflition.. Aiiiitsii»X Losiu...115 

A^yroptof^flUyShoi^test-Oonfifilonco Intervikls* AsBAtt&si Wau}»•, 1^ 
Qawmptittg M^el^iodd. P. St Dwta®.,».»........... f.... t.... •» 138 

Onii^Oon^tlseof E.V. Miams.;. 186 

IteKBi^e Mi^d of A^jtislaDg Sam 

* Maxgiaal Am XCno«ril. P. 

, ,1 ^ ..nt***....,. ...-.t. 168 

£Usiiii&'.«... i :,. 

TpbiilatlQn of the Pmhab^m for tiaa Xtatiio of the Heau Sq^uaim 
. Suocmaive Diffemace to the Votiaaoe. B. t. Hausd lum 
JOBN VON NbUTMANN 

Cumulative Frequency PunotaonB. Xnvnsfa W. Bman..818 

Notes'? 

Aa. Approxbnatfi KTormttUaatien of the Analysis ot Vni!iaiioe>l)leiirit>a« 
tton, SJdwaro Patoson,,..... —..238 

tstotje Oftthe Distribution of Roots of a Polynoirdal with BandonOota.' _ . 

' [Oo^fficLSiELtot' ]N£4, At Q'lSdiBtX'OICi t S*<.t a t .e«(esrV»*'«* 28^0 

; A Note on the lE'robabUity of Arbitrary SiVeutB. Btisu Damrari^.. 238 
.' An Imqoality for iMill^s Ratio. )B, W. Bnofflatrsi.. 


Ypl. No. gi ^ 





the annaus 

OF MATHEMATI'CAl, STATISTICS 

Konr.B By 

S. 8. VV1LK«, rnitm- 

A. T. CKAIO j. SEYMAV 


fir, 0. CAitvKiti 
H; CttAM&t 
W. 'E. t>EMBeo 
G. ]>A«an>is 


wry« TjtK «r 

It. A, Fl@MK|it 

T. a iw 

U. 


It vfftr 

t'K K. Pr,AK!8IJ« 

It L. Kfinrai 
W, A, t^aiswRAtit- 


KWI 4 Quiltord Av™., Mi 

tote Of M4tbemtttic«i 8i«*ktk«, K o. Obt (vllk I* 

PtttsbttHtfi, Pa. Change i« mklmg baX;, T tl" 

* giW tea fibawld be roworted fw 

pteceaSng the motith of that 

^M«to, $eprfaTO * ifnonfte of mo*} \tareb. 

be '^AtWrstAyiArAt. »rAti#Mc« 

., *4^1^ bo typewiittett doiabkhsi^rt vri’th wd«' Mmwmriim 

i^booM bfteobmHted. fooCnotmalonibi k.. u *be orijtiaal po^jy 

t*bssiblie teplaced by a bilylioMoby at fi® **« i if* ^ ‘oisiiioutu and wfaw-rer 
note eboidd be avoided. HeuKA oku'ijt aiaJ fwotuke in foot* 

Ph^ i»hite tianar oy t) rnftf a]> Hoth 'itt hi u i «bowld bn di^wn an 

be pjinied. Aatlioro are reqt«»te«i to keen i^T‘ *be sis» ihoy nm to 

Pf ooraplloakd roal^h^ti^ foptnolao."^ *ypo*5r»ph«ait 

toi«re wiB befttJSiK*^^ **'•% reprints* witlio«t 


^ % S- '}’ ■'I' 

-<r' 


*» wa 

WAVBHty PB^:8«, tae, 

»AiAa«0as, bfo u, g, ^ 





m» Afl}« Af' It m 



THE PROGENY OF AN ENTIRE POPULATION^ 

By Alfred J. Lotka 
JllciropoUtan Life Insurance Company 

The literature on renewal theory has grown to considerable dimensions, 
until evtm admittedly incomplete bibliographies list over 100 titles. But a 
■surprisingly small jiroportion of these publications exhibits any practical ap¬ 
plications to concrete data, and such applications as have been made (e.g by 
'Whck.spll, Hadwiger, Rhodes) are for the most part of restricted scope. 

Anyone who has been following the development will, I think, feel that this is 
unfortunat(‘. Tt ha.s a double disadvantage. On the one hand the purely 
theoretical discu.ssions emphasize difficulties which in piactice may be relatively 
unimportant, being inherent either in some of the unrealistic ad hoc examples 
disciLsscd, or in the cxprcssion.s used to fit smooth curves to the basic data, 
rather tlian in these data thcm-selves. On the other hand some real difficulties 
in application to actual data seem to require further clarification. 

Several of the applications that have been made, including some of my own, 
are restricted to following up the "progeny” of a "population element” com¬ 
prising only individuals all originating at the .same time and therefore all of the 
same age (in the cose of industrial equipment installation all made at one point 
of time). The analysis set forth in the treatment of this special case is competent 
also to deal with the practically more important case of the progeny of an initial 
population of given age distribution, though no example of this has hitherto 
been published.’ Such an example will now be given, and at the same time this 
will afford an opportunity to clarify some points in the presentation of the more 
general case. 

Let Nt be the total number of females at time I, and C[(a) the number com¬ 
prised within the age limits a and a da. Also, let mt(a) be the age-specific 
fertility of females of age a, counting daughters only. If a and u are, respec¬ 
tively the lower and the upper limit of the female reproductive period, and B(() 
tlie annual births of females, then 

(1) 5(0 « J N,c,(a)mt(a) da. 

However, it is not in this perfectly general form that the relation is to be ap¬ 
plied. The case to be considered is that in which the "initial” population is 
througho\it its "future" development, subject to constant age-specific fertility 


^ Compare A, J. Lotka, "The progeny of a population element,” Am, Jour. Hygiene, 
Vol, 8 (1928), p. 876, 

’ An example was given by the writer in an oral communication to the Eighth American 
Scientific Congreae, May 1940, the Proceedings of which have not so far been published 

115 



Ilf) 


MiKKED J. I,OTK\ 


and mortality. If we denote the '‘initial” time hy i - oi (wliieli wo can do ninee 
the zero of time is arbitrary), we can tlieu write 


( 2 ) 



jV(rr(a)jn^(a) da, 


I > w. 


Also, if P(_a(o) Is the probaVnlity for ii female Inirn at time r t •*- a of .sur¬ 
viving to time I, being then a years old, we have 


(3) B{1 — a)p,-,{a) ^ S\c,(a), 

and, in particular, since in the case under consideration pt-aici) is constant for 
( — 0 > to, i.e., for individuals born after t lira ^ 


(4) B(.l — «)p«(o) = NiCiia), I > a + u. 

Now, wc have been at liberty for the "future’' valuc.s of Tni(a) and pi_a{a) 
to make the arbitrary assumption that they retain their values as of i ™ w and 
t — a > to, respectively. But for the "pafit” of the system under consideration 
we do not have equal liberty, for any assumption we. make must be compatible 
with 

(a) the initial age distribution 

(b) equation (1). 

We can, however, wthin these limitations, aasume that (4) still hokis for 
0 < 1 < < 0 , thus 


(6) B{1 - a)p„(a) » .V,c,(a), 

Introducing this in (1) we have 


( 0 ) 



B{1 — a)p„(n)n(j(a) da, 


£ > 0 . 


I > 0 . 


But we cannot now, further assume tliat 

(7) wi,(a) » 7n„(a), / > 0, 

foi, in general, this would make (6) incompatible v\dth (1), 

We can, however, split the integral in (6) into two parts, thus 

(8) B{1) = Bit - a)p„(a)m,(a) da + j B(l - a)p„(o)m«(a) da, 

Avith the assumption, only in the range a < I, 

(9) m((o) “ a < L 

Denoting the first integral in (8) by F(£), and contracting p„(a)m„(a) to 
V5»(a), we may write (8) in the form 

fi(£) = B{C) + f Bit — a)vu(a) da, 

(11) = Fit) -h 3(0, 



PKOGRXY OP A I’OPOOATION 


117 


with 

( 12 ) 

and 


F(t) = 0 i > CO 

m - B(0 Q < l < a 


(13) B{t) = J B{t — a)v?«(a) da, I > co.’ 

The oeaumption (9) haa a definite physical meaning. The integral in (6) 
has been so split that the first part, F{L), givCvS the births of daughters from 
mothers who themselves were born Ireforc i — 0, while the second part, 
gives the births of daughters from mothers born after < = 0. Equation (9) 
therefore expresses the assumption that for mothers born at or after t = 0, 
the age-specific fertilities for ages a < I have the same values mu(a), independent 
of t, as prevail for 1 = co. But at time t there are no mothers of age a > i, 
who w'cre born after i = 0. Hence the assumption (9) can be quite simply 
stated to the effect that the age-specific fertilities mu(a) apply to all mothers 
born after time t ~ 0. This assumption cannot, in general be made for mothers 
born before i = 0, because it would not, in general, be compatible with the 
given initial age distribution and at the same time with assumption (6), Hence 
in the first integral of (8), denoted by F{1) in (10), we must write mda), not 
m„(a). 

Equation (10) is of the form discussed by G. Herglotz,^ who writes its solution, 
(for 1 > 0, in the form of an exponential series, 

(14) B(t) = 20//'' 

where the exponents ryare the roots of the characteristic equation, 

(16) 4>(r) = J e~^‘‘<pu(a) da = 1, 

while the coefficients Q j are given by 


Q, = — 


fPiDe- 


ae (pu 


(o) da 


There is only one real root of (14), since ipuia) ^ 0, for all values of a. For 
complex roots it is convenient to write the corresponding terms of the series (14) 
in trigonometric form 

(17) Q/' = 217e“‘ cos vt — 2ye''‘ sin vt, 

(18) = 2V'(?7» + y^)e“‘ cos (vt + 6), 


* Since ipuin) = 0 for a > «. 

* Math. Annalen, Vol 66 (1908), pp 87 et seq. 



ll« 

vr.riiuj j. jfifKA 

where 

tan 0 ■— r 'V, 

(19) 

{' 

v-'c’ 1 

and 


(20) 

f?‘ -1 ’ 

(21) 

. „ an - sa 

G- 4- IG ’ 

in %vhich 


(22) 

G = / ac""'' cos m ^Ja} <ia, 

a 

(23) 

H - ( 01 ' '“ m\ va irJnhld, 

•Ja 

(21) 

R - f (’■ cos rt Fit) ill, 

Jo 

(25) 

S - pc *" sill vl Fit) ill. 

Jo 

For purposes of numerical application to the problem here ceneidereil, \vt* nuist 
express the annual births B(l) for t < w in terms of tlie pjiven ‘'initial*' age 
distribution at time u. 

We have, generally 

(26) 

n(t AjC((n) ^uCuio- 4" w "If 

il(t - a) = , , = , , ,, , 

Pi^aia) pja 4- w - t) 

since individuals of age a at time 1, are a 4- w ~ t years old at time «. 
Introducing the relation (26) in (10) \vc liavc 

(27) 

-8(0 =* F(.l) + [ ■ yp pMmuia) da, 

Jo ?J«(tt 4- w — t) 

and 


(28) 

Fit) « B{1) ~ f “ "a ^ 

• rt "Pu (.O’ *1 C*) t) 

(29) 

- 0 /■' Nucja + u - t) f . . 

* -"Sc.“ T - L P4a ■+ « - 0 

(11a) 

= 73(0 - ^(0 



PROGEXY OF A POPUEATIOX 


119 


Note that, in computing the integral j3(t) for any particular value of t, the 
argument of the function c„ niiis from a + to — t to to. Thus, for example, if 
the zero of time is 1865 and 1 = w is at 1920, then, in computing ^’(35), i e , 
the value of F for 1900, the range of the argument of in the integral will be 
from 10 + 55 — 35 to 55, i.c., from 30 to 55 

N'limencal Example. By way of a numerical illustration these principles will 
now be applied to a concrete case. We .shall start with the age distribution of 
the white female population of the United States as constituted in 1920, for 
which previous iiiiblications furnish some of the required data, including the i cal 
root and the fir-st three pans of complex roots of the characteristic equation. 

From thi.s "initial” age distribution in 1920 it is necessary first of all to com¬ 
pute the auxiliary function F{() for the 55 years prioi to 1920. The first term 
B{t) in the right hand member of (28) is very easily computed for successive 
values of i from the relation (5a), which simply c.xprcsses the fact that persons 
a years old in the year u, i.e., 1920, are the survivors of the B(u — a) peisons 
born in the year u — a. 

(5a) No,Cu{a) = B(w — a)p^{a). 

In the diagram Fig. 1, which is drawn in stereogiaphic piojection, the age 
distribution of the (white female) population of the United States in 1920 is 
represented as plotted in a plane reaching forward at right angles to the plane 
of the paper. Successive points of B{1) for 0 < i < u, have been computed 
“by survivals” according to (5a) and plotted as a curve in the plane of the 
paper “at the back” of the diagram The arrows indicate for a selected point, 
namely age 25 in 1920, the path of the computation according to equation (5a.) 

The second term 0(t) in the expression ( 11 a) for F{i) was computed from the 
age distribution in 1920, the rates of survival from previous years into 1920,“ 
and the age-specific fertility at each age in the reproductive period, 10 to 55, 
on the basis of the relation (28). The results, for this second term in the ex¬ 
pression for F{t) computed for every fifth calendar year back of 1920 to 1875 
and inteipolated for intervening years,” w'ere also plotted as a curve in the lear 
plane of the diagram. The shaded area in the curve for the age distribution in 
1920, and the arrows leading from this shaded area to the curve 

(10, 11) /3(f) = f B(t - a)Ma) da 

^ Of 

/on 11 ^ f Aft»Cc,(a ' I 0} — f) / \ j 

(29, 11a) = / — 7 —r---..r- <Pw(a) da, 

"a Pw(a -|- CO ij 

indicate in this case the path of the computation according to. equation (28). 

‘ Using tlip Foiuliay life table for w’hitc fcmnlcs in 1919-1920 In the first quinquennial 
age group, tlic following values were used. 

p(0.5) = .9400 p(2.6) = .9135 

p(l 5) = 9235. p{3.5) = .9080 ?3(4 5) = .9040 

“ This term vanishes for i < 10, 1.0., back of 1875. 



120 


ALFHED J. EOTKA 


From these two curves, taking diffcrcncps, the curv(; of /-’{O ■^^(0 “ ^(0 

was plotted, as shown. 

With the values of F{1) thu.s obtained, we may proceed, Viy formulae (14) to 
(25), to compute values of B(0 for all values of 1 > 0. So far as the period 
1866 to 1920, corresponding to 0 < i < w, is concerned, this merely inttans that 
we have an analytical expression to fit what is essentially a fundamental datum 
of the problem. For values of« > « the formula givc.s us a continuation of the 
function B{t) for all future time so long as the given age-speeifie fertility and 
mortality holds. 



Flo. 1. Graph illustrating computation of auxiliary function F(t) from ‘‘initial’* ago 
distribution. 


The final results of this computation are exhibited in Figs. 2, 3 and 4. Of 
these, Fig. 2 exhibits the first, second and third oscillatory components for the 
period from 1890 forward. It will be seen that the waves are heavily damped, 
so that after a relatively short period the aperiodic component dominates the 
course of events. 

Fig. 3 exhibits, for the years from 1866 to 1920, i.e., for the period 0 < i < w, 
the aperiodic component (in a dashed line) and, as indicated by small circl(^, 
the sum of this component plus the three oscillatory components. It will be 
seen that from about 1890 forward the points so obtained follow rather closely 
the value B{t) derived by survivals from the age distribution in 1920. 




PROGENY OF A POPULATION 


121 


ClkllNOAa TEAR 

■MW wwiMOHiiiwnnMini 



Fiq. 2 First three oscillatory components of total annual birtlis 


CALENDAR YEAR 

1865 ie?S 1685 1895 1905 1915 1920 



» Sum of finl four compononls 

Fio. 3 Graph of functions B(<)i P(i)> and F(l) for 0 < f < o), i.e., for 1865 to 1920, to 
gether with aperiodic component; also, summation of aperiodic and first three os 
cillatory components. 




122 


ALFRED J. LOTKA 


MILLIONS 



1890 '00 10 ^ 'JO Ho ’50 '60 '70 '80 ' 90 


CALENDAR YEAR 

Fib. 4. Sum of aperiodic and three oscillatory terms of series solution compared with 
results of “step by step" computation of annual births. 


TABLE I 

Constants of the Series Solution (U) of Integral Equation {10) to Third Oscillatory 
Component Inclusioe i = 0 at 1865 


Funo- 

Aperiodic Com- 



Oecillfttory Components 

" 

- 

tion 

ponent 



-- 

—... -- 

-- 

-- . 





Pi rat 


Second 


Third 


U 

.643 X 10-2 

- .386 

X 

io-> 

-8.731 

X 

10-2 

-9.801 

X 

10 2 

V 

0 

21,448 

X 

10-2 

31.542 

X 

10-2 

48.849 

X 

10 ® 

0 

28.226 

25.768 



51.226 



37.(X)8 


H 

0 

14.938 



-18.637 



17.206 



R 

23.262 X 10® 

-17.863 

X 

10 ® 

-37.190 

X 

10 ® 

11.684 

X 

10 * 

s 

0 

-31.508 

X 

10 ® 

16,827 

X 

10 ® 

-16.543 

X 

10 * 

u 

82.416 X 10< 

-10.494 

X 

10 ® 

-74 679 

X 

10 ® 

88.014 

X 

10 ® 

V 

0 

61.442 

X 

10 ® 

-56.787 

X 

10 ® 

48.808 

X 

10 ® 



PROGENY OF A POPULATION 


123 


Prior to about 1890, four components alone are quite inadequate, and the 
corresponding points have been omitted from the diagram The lack of con¬ 
cordance, with such limited components, is inconsequential in this part of the 
series, since the purpose of this part of the work was merely to compute the 
auxiliary function F{t), and the fit obtained for B(t) in this range, so far as it goes, 
is merely a by-product, the main interest being in the course of B{t) for I > u, 
i.e., in the years following 1920 

This course is charted in Fig 4, in which the points obtained by the series 
solution (14) of (10) are again shown as small circles, while the fully drawn curve 
is derived from my previous publication “The Progressive Adjustment of Age 
Distribution to Fecundity.”’ The annual births in that case were obtained 
“step by step” by computing age distributions by survivals for successive 

TABLE II 


United States White Female Population 1920, Observed, Also, the Same Projccicd 

Forward for Later Fears® 


Year 

Population, 

thousands 

Births, 

thousands 

Birth rate per 1,000 
per annum 

1920 

49,390 

1,082 

23 32 

1930 

51,727 

1,162 

22.46 

1940 

56,910 

1,252 

22.00 

1960 

61,639 

1,307 

21.20 

1960 

65,838 

1,379 

20.95 

1970 

69,829 

1,465 

20 98 

1975 

71,828 

1,504 

20 94 

1980 

73,850 

1,543 

20.89 

1985 

75,902 

1,584 

20.87 


quinquennial periods, and applying to the reproductive age groups, in each 
ease, the values of the reproductivity m„(a). 

It will be seen that the points obtained by the solution (14) follow very closely 
those computed “step by step,” although in the computation of the latter an 
approximation was made, using pivotal values of p„(a) for the several quin¬ 
quennial age groups. A slight error introduced in this way would tend to be 
cumulative, and perhaps accounts for the fact that towards the end of the 
period covered (1986), the two sets of values diverge slightly. Even so, in 1985, 
the divergence is only about .4 percent. 

The series solution has, of course, the advantage that it gives directly the 
result for any particular point of time, wheieas the “step by step” method re^ 

^ Jour. Washington Acad Sci., Vol. 16 (1926), p. 505. 

“ Calculated step by step from aurvival ratios and age specific fertilities, both held 
constant as of 1920 (reproduced for ready reference from Jour. Wash. Acad. Set., Vol. 16, 
p.605). 





124 


ALFRED J. LOTKA 


quires the computation of the annual births for all intervening points in order 
to obtain the result for the chosen point of time. 

Furthermore, the series tells us at once that the course of events i.s of the nature 
of a trend proceeding in geometric progression upon wliich are .superposed, a 
series of damped oscillations, of which the fundamental has a wave length equal 
approximately to the mean length of one generation from mother to daughter, 
i.e,, about 28 years, 

Alternative procedure. The procedure set forth in the preceding .sections in¬ 
volves not only arbitrary assumptions regarding the values of p{a} and m(a) 
for “future” time, which are fundamental to the problem under consideration, 
but involves further incidental assumptions regarding their v'alues prior to the 
''initial” condition at the instant denoted by t =; a>. The^e incidental assump¬ 
tions are in a sense superfluous, since the future liistory of the system is com¬ 
pletely determined by the initial age distribution and the assumed "future” 
values of p(o) and m(a). The additional assumptions were introduced merely 
for the purpose of translating the initial age distribution into a series of values of 
B{t) for 0 < I < ii), i.e., prior to the given initial age distribution. 

In actual fact the age distribution at time t = u did not arise in the manner 
assumed; actually both p(o) and m{a) undoubtedly varied in the period 1865 
to 1920, and migration also affected the situation. The quantity Fit) intro¬ 
duced in equation ( 10 ) is, in fact, a purely auxiliary function having no direct 
relation to the biological events at time I < u. 

An alternative procedure which would avoid these conflicts, and introduce 
assumptions only regarding "future” values of p(a) and m(fl), would be to 
compute B{t) step by step over the period from 5(1920) to 5(1920 -f w) 
5(1975). 

Placing the zero of time i = 1920 this would give 5(0 for 0 < f < w. For 
< > w we should have, simply 

5(0 = J 5(f — a)ipim{a) da, t > w, 

using, in the evaluation of the integral, the values of Bit — a) obtained by the 
step by step process. 

We could here also split the integral into two parts 

5(0 = Bit — a)v3io*o(a) da + J Bit — o)i,oi»M(a) da 
= 5(0 + f Bit — o)v»i»m(o) da. 

But the function ^ 4920 ( 0 ) is now the same in the two integrals, and there is no 
occasion, in this case, for distinguishing the two parts of the integral. 

If this procedure is adopted, its application to the course of 5(f) for f > w, 



PROGENY OP A. POPULATION 


125 


i,e. beyond 1975, is of minor interest, for by that time it has practically settled 
down to the aperiodic (exponential) component, the oscillations being greatly 
damped down. The major interest in the result of a computation carried out 
by this procedure would be in the fitting of a series of the form (14) to the 
function B{t) in the range 1920 to 1976, which, in this setting, figures as a known 
“arbitrary” function. 

Of the two alternative procedures the one carried out in detail in the text 
and the numerical example is of greater interest, as exhibiting in greater gene¬ 
rality the application of the Hertz-Herglotz solution. 

BIBLIOGRAPHY 

A lew more titles may be added to previously published bibliographies (see theso Annals, 
Vol. 10 (1939), p. 22; Vol. 12 (1941), p 266). 

1936 W, MObchler, "Untersuchungen Uber Eintrittsgcwmn und Fehlbetrag einer Ver- 
sicherungakasse,” Bull, de I’Aasoctatton des Actuaires Suisses, October (1935), 
p.129 

1936 An Apzalipour, Conlribuhon A I'itude de la ihiorie malhimalique de la d&mographie, 
Thfese, 1936 

1938 H. Hadwiger, ‘‘Ein Konvergenzkritenum fllr Erneuerungazablen,” Skandtnavisk 
Akluarielidakrift, Vol. 3-4, p. 226, 

H. Hess, Anwendung der logialtachen Funktion in der malhemaltachen BevSlkerunga- 
theorie, Inaugural Diseortation der philoaophischen Pakult&t der Universitat 
Bom. 

W. M, Dawson and W Edwards Deminq, “On the problem of natural increase,” 
Growth, Vol. 2, p. 319. 

1940 A. W. Brown, “A note on the use of a Pearson Typo III function in renewal theory,” 
Annala of Math Slat., Vol. 11 (1940), p. 448. 

1940 E. ZwiNOGi, “Entwicklung von Personengesamtheiten—ZusammcnfasBcnder Be- 
richt,” Twelfth International Congreaa of Actuaries, Vol. 3, p. 263 
1940 E. Keinanen, “Uber die Alteraverteilung der Bevblkerunp,” Twelfth International 
Congress of Actuaries, Vol. 3, p. 305. 

1940 S. S. Townsend, “Some observations on the internal variation in groups of lives 
assured in industrial assurance and in the general population of England and 
Wales,” Twelfth International Congress of Actuaries, Vol 3, p. 319. 

1940 R Tarjan, “Untersuchungen Uber den Kapitalbedarf des Lebensversicherungs- 
gesobilftes,” Twelfth International Congress of Actuaries, Vol. 3, p 335. 

1940 M. Presburger, "Sur I’dtude g4n4rale des collectivitiSs de personnes,” Twelfth 
International Cgngreas of Actuaries, Vol, 3, p, 363. 

1940 A Maret, “Direkte Berechnung der Vorgangsfunktionen einer offenen Gesamtheit,” 
Twelfth International Congress of Actuaries, Vol. 3, p. 387. 

1940 E. ZwiNOQi, “Uber Zusammonbkngo zwisohen der toohnisohon Stabilitftt einer 
Sozinlveraichorungskasso und der Entwicklungsforrael fUr den Versioherten- 
bostand,” twelfth International Congress of Actuaries, Vol. 3, p, 396. 

1940 H. Hadwiger and W. WEaMUnEBB, "Entwicklung und Umsiohichtung von Potsonon- 
gesamtheiten,” Twelfth International Congress of Actuaries, p. 369. 

1940 W. Dobbbrnack and G. Tietz, “Die Entwicklung von Personengesamtheiten vom 

Standpunkt der Sozialersicherungstechnik,” "Zwolfter Internationaler Kongress 
Der Versicherunga-Mathematiker,” Vol, 4, p, 233. 

1941 R C. Geart, “Irish population prospects considered from the viewpoint of reproduc¬ 

tion rates,” Statistical and Social Inguiry Society of Ireland, 1941. 



120 


ALRRE0 J. LOTKA 


19-Jl H.HADWtoER, “Kinr p'orniel dcrmathrmatisrhi'ii Hrvt>lkerun|jslh(‘«iric,'' .ViHo/muji'k 
(ler VerciniffKnff ,')cfinvucnsc/irr Vcrfiichf-rungamalhfjnfilikiv, Vm], JI, j). G7 

L. Fehaud, "1a> rennuvpllcmr'nt, prfjMprnt'N ctmnfXi'M f! Icf* pqiiHlifuift 

inWgralqg d(< cycle fenmf," Mitlctlnngcn dtr Vcrcimgung iSfhwaartrchir Vrr- 
sicherungmathcinattker, Vol. 41, p. SI 

W. Feller, '‘Oh the integral criunticm uf the ri'Oewfil thenry," .4niwl,i uf Malk. 
Slal., Vol. 12 (toil), f). 243. 

Harro Hernardelli, *‘PoiuiI(\ti()n waves," Journat of the Hurmn Rfffnrch 
Vol. 31. 



ASYMPTOTICALLY SHORTEST CONFIDENCE INTERVALS^ 

By Abraham Wald“ 

Columbia University 

The theory of confidence intervals, based on the classical theory of proba¬ 
bility, has been treated by J. Neyman.® While Neyman considers the case of 
small samples, we shall deal here with the limit properties of the confidence 
intervals if the number of observations approaches infinity. 

1. Definitions. We will start with some of Neyman’s definitions. Let 
j{x, 6) be the probability density function of a variate x involving an unknown 
parameter fl. Denote by a point of the Ti-dimensional sample space of n 
independent observations on x. If p(E„) denotes for each a subset of the 
real axis, the symbol P[p{E^cd' \ Q"\ will denote the probability that p(E„) con¬ 
tains B' under the hypothesis that B" is the true value of the parameter. Let 
and B{E^ be two real functions defined over the whole sample space such 
that < 0(En)- The interval 5(E„) = [fi(E„ , e(E„)] is called a confidence 
interval of B corresponding to the confidence coefficient a (0 < a < 1) if 
P[5(iJ„)c^ I fl] = a for all values of 6. 

The interval function 5(E„) is called a shortest confidence interval of B corre¬ 
sponding to the confidence coefficient a if 

(a) P[S{E„)cO \ B] = a for all values of 0, and 

(b) for any interval function &'(En) which satisfies the condition (a) we have 

Pl5(K)c0' 1 0"1 < Pl6'(K)c0' I B"], 
for arbitrary values O' and B". 

The interval function S{En) is called a shortest unbiased confidence interval 
of B if the following three conditions are fulfilled; 

(a) P[6{En)c8 I 0] = a for all values of 0. 

(b) P[5(Pn)c0' I 0"] < « for all values of 0' and 0", 

(c) For any interval function 5'(E„) for which the conditions (a) and (b) are 
satisfied, we have 

P[5(P„)c0' 10"i < P[a'(B0ce' I B"], 
for all values of 0' and B". 

For any relation R we shall denote by P(P | 0) the probabihty that P holds 
under the hypothesis that B is the true value of the parameter. Similarly for 

' Presented at a joint fneoting of the Institute of Mathematical Statistics and the Ameri¬ 
can Mathematical Society in Hanover, September, 1040, 

• Research under a grant-in-aid from the Carnegie Corporation of New York. 

* J, NkymaN, "Outline of a theory of statistical estimation baaed on the classical theory 
of probability,” Phil. Trans. Roy Soc. London, Vol, 236 (1937), pp. 333-380. 

127 



128 


ABRAHAM WALD 


any region Q„ of the n-dimenaional sample space the symbol P(Qb | 6) will denote 
the probability that the sample point falls in under the hypothesis that 6 
is the true value of the parameter. 

In all that follows we shall denote a region of the n-dimensional sample space 
by a capital letter with the subscript n. 

A real function B(En) is called a best upper estimate of 6 if the following two 
conditions are fulfilled; 

(a) P[S < i{En) 1 fl] = ot for all values of 6. 

(b) For any function which satisfies the condition (a) we have 

P[B' < 6(En) I B"] < Pie' < B'(E„) I B"] 

for all values B' and 6" for which B' > 6". 

A real function 5(j5„) is called a best lower estimate of 9 if the following two 
conditions are fulfilled: 

(a) > 9(S„) I 9] = a for all values of 9. 

(b) For any function B'{E,) which satisfies the condition (a) we have 

P[e' > §{E„) I 9"1 < Pis' > §'{E.) I 0"] 
for all values of $' and 8" for which 9' < 9". 

We will extend the above definitions of Neyman to the limit case when n 
approaches infinity. 

DBFXNmoN I: A sequence of interval functions (n == 1, 2, ■ • -) w 

called an asymptotically shortest confidence interval of 6 if the following Cioo conditions 
are fulfilled : 

(a) P[fi„(B„)c9 1 9] = a for all values of 9. 

(b) For any sequence of interval functions {5rt(JB„)l (a =* 2, ,ad inf.) 

which satisfies (o), the least upper bound of 

P[SMc8' I n - Pis'McB' I B"] 
with respect to B' and 6" conoerges to zero os n —v <». 

Definition II; A sequence of interval functions is called an asymjy 

iotically shortest unbiased confidence interval of 8 if the foUomng three conditions 
are fulfilled', 

(a) P[8„(B„)c9 I 9] = cefor all values of 8, 

(1)) The least upper hound of P[8„(,En)c8' \ 9"] with respect to 6' and 0" converges 
to a with W —> 00 . 

(c) For any sequence of interval functions which satisfies the conditions 

(o) and (b), the least upper bound of 

P[S„iE„)ce' 1 0"] - P[6'„{En)c0' 1 6"] 
with respect to 8' and 8" converges to zero with n —* «o. 

Definition III: A sequence of real functions 1 (n =» 1, 2, •«■ , od inf.) 

is called an asymptotically best upper estimate of 9 if the following two conditions 
are fulfilled: 

(o) P[9 < 9„(P„) 1 9] = a for all values of 8. 



CONFIDENCE INTBHVALS 


129 


(b) For any sequence of functions which satisfies (a) the least upper 

hound of 

P[6' < UErd I B"] - PIB' < bUE^) I (?"] 
in the domain F > B" converges to zero with n —> oo. 

Definition IV: A sequence of real functions is called an asympto¬ 

tically best lower estimate of B if the following two conditions are fulfilled : 

(o) P[0 > 6„{En) I fl] = a for all values of B. 

(b) For any sequence of functions {^l(Pn)) which satisfies (a) the least upper 
hound of 

P[B' > 6,(E„) I 6"] - P[8' > 6'„{E:) I 6"] 
in the domain B' < B"'converges to zero with n 

2. Two Propositions. Phoposition I: Let (n = 1, 2, ■■■, ad inf.) 

he for each B a sequence of regions such that the following two conditions are fulfilled'. 

(a) P[Prn(fl) I fl] = 1 — a for all values of 6. 

(b) For any sequence of regions iZn(B)] which satisfies (a) the least upper hound of 

P[Z„(B') I B'>] - P[TV„(0') I 6"] 

in the domain B' > B"{B' < 6") converges to zero with n °o. 

Denote hy pn{En) the set of all values of Bfor which En does not lie in TVn(fi)- Then 
we have 

(c) P[pn{B„)c6 I fl] = a for all values of 8. 

(d) For any sequence of set functions {p„(P„)) which satisfies (c), the least upper 
hound of 

PW{E„)ce' I B"] - P[pUP„)c0' I B"] 
in the domain 6' > B"{B' < B”) converges to zero with n —> w . 

Phoposition II: Let {TVn(5)l he for each 6 a sequence of regions such that the 
following three conditions are fulfilled: 

(a) P(Wn(B) 1 0] = 1 — a /or all values of B. 

(b) The greatest lower hound of P[Wn(B') | 6"] converges to 1 — a with n —. 

(c) For any sequence (W„(fl)} which satisfies (a) and (b), the least upper hound of 

PllVUe') I B"] - PiWnid') 1 B"] 
with respect to B' and B" converges to 0 with n —»• co. 

Denote by p„(Pn) the set of all values of Bfor which En does not lie in Then 

we have 

{d) P[pniEn)cB 1 0] = a for all values of B, 

(e) The least upper hound of P[pn{En)cB' ] B"] conuerfifes to a with n —> «, 

(/) For any sequence of setfunctions {pj, (£?„)} which satisfies (d) and (e), the least 
upper bound of 

P\pn(E„)cB' 1 B"] - PlpniErdcd' | B’[] 
with respect to S' and B" converges to 0 with n —» «>. 



130 


ABHAHAM WAU> 


The validity of the above propositions follows easily from the identity 

p[p„(^„)ce' 1 n = I - ! n. 


3. Assumptions on the probability density function. For airy function 
if/(x) denote by iiVfx) the expected value of ^(x) under the aKsumption that 
is the true value of the parameter, i.e. 


E,^ix) = [ 


^(x)f(x, 6) dx. 


For any a:, for any positive i, and for any real value d' denote by >pdx, O', 5) the 

greatest lower bound, and by ipi(x, O', 5) the least upper bound of log/(x, B) 

in the interval d' — S < 6 < 6' + S. 

Throughout this paper the following assumptions on f(x, B) mil be made; 

Assumption I: The cxpcclaiion Ef — log/(x;, 0") is o ron/inuoiis function of 

od 

O' and of, and for any pair of sequences |0«) and (n » 1, 2, • • • , od inf.) 
for which 

Ym Et;,--log fix, Bn) =» 0 

A«w« C/0 

also 

lim (0I, — 0") = 0. 

Furthermore 

is a hounded function of 6' and 6", and 'Et 
lower bound. 

Assumption II: There exists a positive value fc® such that the expeclaliorm 
Ei’‘p\ix, B", S) and B", 8) are uniformly continuous /unctions ofB', B" otui 

8 where 8 takes only vdves for which | 5 | < h. Furthermore it is assumed tAnt 
Ei'[(piix, e", 5 )]® (i = 1 , 2 ) are bounded functions of 0\ 0" arici*^ (1 fi j < fee). 
Assumption III: The relations 

+« a . ni 

hold. 

The above assumption means simply that we may differentiate ivith respect 
to 0 under the integral sign In fact 




5-,io6/b,e) 


d(B) has a pomthx 



CONFIDENCE INTERVALS 


131 


identically in 6. Hence 

feL /(=:.•) dx-0. 

Diiferentiating under the integral sign, we obtain the relations in Assumption III. 
Assumption IV: There exists a positive, tj such that 

is a hounded function of 6. 

4. Some theorems. The assumptions on f{x, 6) made in this paper become 
identical with the assumptions I-IV formulated in a previous paper^ if a certain 
set 61 involved in those assumptions is put equal to the whole real axis 
(— “, + “) Hence we can make use of all results obtained in that paper 
putting 0 ) == (— 00 , 4 - 00 ). Among others, the following statements have been 
proved there. 

"13 

(A) Denote 23 by Vni^, -En) and let be the region 

defined by the inequality E„) > A„i6) where A„id) is chosen such 
that P[Rn(6) 1 0 ] = 1 — a. Then for any sequence of regions {Z„(d)] for 
which P[Zn(d) ( 0 ] == 1 — a, the least upper bound of 

P[Z,(d>) I e>'] - P[E,{6') I e"] 

in the set 9" > 6' converges to 0 with n —1 00 . 

(B) Let <S„( 0 ) be the region defined by the inequality yn{9, Ef) < B„(0) where 
B„(d) is defined such that P[iSn(ff) \ 0\ = 1 — a. Then for any sequence of 
regions {Z„(9)] for which P[Znid) | 0] = 1 — a, the least upper bound of 

P[Z„{8') I d"] - P[S„(e') I e"] 

in the set 6" < O' converges to 0 with n 00 . 

(C) Denote by T„{B) the region defined by ] y„{B, E„) \ > Cn{8) where C„(ff) 
is chosen such that 

(a) P[r„( 0 ) 1 = 1 - a. 

Then 2\{d) satisfies also the following two conditions: 

(b) The greatest lower bound of PIT„{O')0"] converges to 1 — a with 
n —> CO. 

(c) For any sequence of regions {Z„{6)] which satisfies (a) and (b), the 
least upper bound of 

P[z,ie') 1 0 "] - P[Tn( 0 O 1 0"] 

converges to 0 with n . 

* A. Wald, "Some examples,of asymptotically most powerful testa,” Annals of Malh. 
Slatr, Vol 12 (1941), pp. 396-408. 



132 


ABRAHAM WARD 


On account of Propositions I and II we easily get the following theorems: 
Theorem I: Denote by fn(-®n) the set of all values of 6 for which E ^) < 
/i„(e) and An{9) is defined such that P[y„(5, Ef) > j = 1 Then 

^n{En) satisfies the following two conditions: 

(a) P[^n{En)c8 1 ^] = a for all values of 9. 

lb) For any sequence of setfunciions {{n(-K'n)| which satisfies the condition (a), 
the least upper bound of 

P[UE„)ce' j fl") - I 9"] 

m the set 9'* > 9' converges to 0 with u —♦ «>. 

Theorem II: Denote by fn(En) the set of all values of 6 for which y„(9, E„) > 
B„{9) and B„{9) is defined such tluU P[yn{6, B„) < Bf(9) | = 1 — a. Then 

in{En) satisfies the following two eondiliona: 

(o) P[f„(£!n)c0 I 6] = a for all values of B. 

(b) For any sequence of setfunciions which satisfies the condition (a), 

the least upper hound of 

P[U(Ef)c9' I e>'] - P[^:(E„)c9' 1 9"\ 
in the set 9" < 6' converges to 0 with rr -+ «j . 

Theorem III: Denote by PniEn) the set of all values of 9 for which J ydfix Ef) i tSs 
and Cn{6) is chosen such that Plj Ef) | > Cn{9) j = 1 — a. Then 
Pn{En) satisfies the following three, conditions: 

(o) Pbn{E„)c6 1 fl] = « for all values of 9. 

(b) The least upper bound of P(pb(A\)c5' ] 9"] converges to a with n —* c*. 

(c) For any sequence of setfunciions IpI(jBb)) which satisfies the conditions (a) 
and (b), the least upper bound of 

P[p„(P„)cfl' I e") - P[pn{Ef)c9' I 9'^] 
converges to zero with n —+ <». 

Now we shall investigate the question whether the seta in{Ef), and 

pn{En) are intervals. For this purpose we will prove some propositions. 

Proposition III: Let * ond D he two posilive numbers such that t < D. Denote 
by Qn{9, f, D) the region which consists of all points E„ for which 

yn{9 + e', E„) < —n*, and y„(9 — d, E„) > n' 

for all values d in the interval [e, D], Then we have 

(1) lim PlQii(9, *, D)\9\ ^ 1 

uniformly in 9. 

Proof; Let ei, tj be a sequence of points in the interval («, D] such 

that *1 “ e = t 2 — ei = • • • = £, — = X> _ =« A:<i (say), where r is chosen 

sufficiently large such that Assumption II holds for [ S j < ke . Denote by 
t,) the region in which 

( 2 ) 


Vnie -f «., E„) < -nK 



We will show that 
(3) 


CONFIDENCE INTERVAIjS 


133 


lim P[Rn{6, t.) I 9] = 1 

n MOO 

uniformly in 6. 

From Assumption I it follows that the greatest lower bound of 

Et~logKx,e + «') 

with regard to t' in the interval [«, D] is positive. Let this greatest lower bound 
be A > 0. Since on account of Assumption lEt— log f(x, 6 + e') is a continu- 
ous function of it does not change sign in the interval e < t' < D. Since 

r 3 

this is true for arbitrarily small c and since Ei — log f(x,6) ~ ~Et ^ log 

Lop 

f{x, 8) has a positive lower bound (Assumption I), it follows easily on account of 
Assumption II that , 


Et ^ log /(*, 9 + «0 < 0- 

du 


Hence 


(4) E$ ^ log/(x, 8 + (') < —A <0 for t < t' < D, 

dO 

and therefore 

(6) Etyn(8 + t', E„) < -A-y/n for t < t' < D. 

From Assumption II it follows that the variance of y„(9 + e', jE?„) is a bounded 
function of 8 and Hence 

(6) lim P[yn(9 + a, En) < — hAs/n 19] = 1 


uniformly in 9. The equation (3) is a consequence of (6). 
Denote by (S„(9, e.) the region in which 


— y ‘'j j 9 “h €i ) fco) 
n a 


< C 


(i = 1, 2) 


where C is greater than the least upper bound of ] Ei(pi(x, 8', h) \ with respect 
to 9 and 9'. Then we have on account of Assumption II: 


(7) lim P[5„(9, e*) | 9] = 1 (i = 1, 2, • • • , r) 


uniformly in 9. In the region S„(9, *{) we obviously have 
(8) yn(9 + ej , En) < yn{S + e,, En) + 2kt,\/nC 



134 


ABIIAHAM WALD 


for all values in the interval [«,- — A'o, £. + h]. By choosing r suffioiently 
large we can always achieve that 

2hC 

Denote by Tn{&, «<) the region in which 

(9) y^id + ti , En) < —^ Vn for << — Ab < < « + fo. 

From (6), (7) and (8) we get 

(10) lim P(r„(e, £.) I e] » 1 

uniformly in 6. Lot «i E) he the common part of the r regions 

3’n(9j «i)) ’ ■ • , Tn(&, <r)) i.c. Qni^) <) E) IS tliD sot of all points E„ for which 

y„(e + £', EJ < Vn 

for all t' in the interval [«, D]. Since r is a fixed positive integer not depending 
on n, we get from (10) 

(11) lim P[<2;(0 ,«,D)|<?1 » 1 
uniformly in 6. 

In the same way we can prove that 

(12) lim PlQ'^id, £, D) 1 0] » 1 

nw>w 

uniformly in 6, where Q'^{6, <, D) denotes the region in which 

VniB — En) > - Vn for all t' in U, D], 

Proposition III follows from (11) and (12). 

Proposition IV: Denote by Vn{6, «) the region in which 

En) < - 71 * 

/or all values 0' in ike interval [d - t, e + e]. There exists a positwe e such that 

lim PlVn(e ,«) 10] = 1 

uniformly in 6. 

Proof: Since the least upper bound of Em(.x, $, 0) is <0, we get from 
Assumption II that the least upper bound of Et^(x, 0, «) is <0 for sufficiently 



CONFIDENCE INTERVALS 


135 


small £ > 0. Denote tlie least upper bound of 6, e) by —B and let the 

region in which 

1 X-v 

- > ^) «) < —iE 

n a 

be denoted by e). From Assumption II it follows that 

lim P[Fn(«, c) I e] = 1 

n 

uniformly in $. Since for almost all n TF„(d, «) is a subset of Vn{0, t). Proposi¬ 
tion IV is proved. 

Proposition V: Lei B„{6), C„{d) be the functions as defined in Theorems 
I-III. There exists a finite value G such that 

I I < (?, I BM I < 0 and | C„ie) \ < G 
for all B and all n. 

Proposition V follows easily from the fact that the variance of Ef) is a 
bounded function of n and 6. 

Let D be an arbitrary positive number and denote by W„(6, D) the region con¬ 
sisting of all points for which the following conditions are fulfilled: 

(a) The equation En) = A„{d') has exactly one root in 6' which lies in the 

interval [B — D, $ D], 

(b) The equation E„) = B„{d') has exactly one root in B' which lies in 
the interval [B — D, 6 A- D]. 

(c) The equation P„) = Cn{9') has exactly one root in B' which lies in 
the interval [0 — Z), 0 -|- D], 

(d) The equation Ef) = —C„{6') has exactly one root in B' which lies in 
the interval [B — D, 6 + D], 

(e) The common part oi [B — D, B D] and {„(£'„) is the interval [0l(P„), D] 
where 6n{En) denotes the root of the equation in (a). 

(f) The coinmon part of f„(£?„) and [B D, B + .D] is the interval [-D, 
where B„ (B„) denotes the root of the equation in (b). 

(g) The coupon part of p„(Z?„) and [B ~ D, 6 + D] is the interval 
[Bn{E„), 6n{E„)] where 6n{E„) denotes the root of the equation in (c) and 
6niE„) denotes the root of the equation in (d). 

From Propositions III-V follows easily the following 
Proposition VI. For any positive value D 

lim P[TV„(«, D) \ 6] = 1, 

tl—M 

uniformly in B, provided that the functions A„(0), B„(,B) and Cn{6) are continuous 
and of hounded variation in any finite interval. 

We will show that Proposition VI remains valid for D = -f <», if -wo make the 
following 



136 


ABRAHAM WALD 


5 

AflSDMPTtoN V: Denote by Hx, 9, D) the least wpper bound of ^ log fix, O') 

with respect to 6' where 8' > B D. Denote furthermore by ^*( 1 , 8, D) the greatest 

lower bound of - tog fix,O') with respect to where f ^ 6 — D. There exists a 
88 

posilivB D such that the least upper bound of Etp{x, 8, D) with respect to 8 is negative, 
the greatest lower bound of 6, D) unth respect to 8 is positive, and the larianers 

of il^ix, 8, D) and ^p*ix, 8, D) are bounded functions of 8. (The variances are 
calculaled under the assumption that 8 is the true value of the parameter.) 

It follows easily from. Assumption V that 

lim P ,8,D) < —n* | 

= lim P ,9,D)> n* 1 *= 1 

uniformly in 8. 

Since 

-7= D Hx. ,e,D)> v„i8', Ef) for 8'> 8 + D 
Vn m 

and 

Zi)) ^ E.) for d. 

Proposition VI remains valid if we substitute 4“« for D. 

Hence we obtain the following 

Corollaby; If the assumptions I-V are fulfilled and if A„(8), B„(8) and C„{6) 
are continuous and of bounded variation in any finite interval, then 
(a) The root 8'„(E„) of the equation y„(8, E„) » A„(8) in 8 is an asymptotically 
best lower estimate of 8, 

(h) The root of the equation yn{d, Ef) = B„(B) in 9 is an asymptotically 
best upper estimate of 8. 

(c) The interval [fln(ilB), K(E„) is an asymptotically shortest unbiased confidence 
interval of 6, where denotes the root of the equation yn(8, E„) « 
and 8„(E„) denotes the root of the equation y„($, Ef) *• —Cn(8). 

6. Some Remarks. 1 . I should like to make a few remarks about the relation* 
ship of these results to those obtained by S. S. Wilks.* The definition of a 
shortest confidence interval underlying Wilks' investigations is somewhat dififer* 
ent from that of Neyman’s which has been used in this paper. According to 
Wilks, a confidence interval S(En) is called shortest in the average U the expected 

‘ 8. S Wilks, "Shortest average confidence intervals from large samples,'’ Annals of 
Math. Slat., Vol. 9 (1938), pp. 166-176, 



CONFIDENCE INTERVALS 


137 


value of the length of 5(E„) is a minimum. The main result obtained by Wilks 
can be formulated as follows: The confidence interval [^nC^^n)) KiEn)] given in 
our Corollary is asymptotically shortest in the average compared with all confi¬ 
dence intervals computed on the basis of functions belonging to a certain 
class C. In the present paper no restriction to a certain class of functions has 
been made. 

2. If the parameter space fl is not the whole real axis, but an open subset of 
it, and if the assumptions I-V are fulfilled when 6 can take only values in Q, 
the previously proved Corollary remains valid. If fl is a bounded set. Assump¬ 
tion V is a consequence of Assumptions I-IV. 



GROUPING METHODS 
Hy P. S. DwYitit 
rnu'irM'ln t‘f ^ftrhll|nu 

1. Introduction. TIio coiuvnLiontil forinulfus for iimmctit udjusttticiK** kiitmu 
as Shcijpanl’ri com'Otions aio nut too MUisfacl.ury fui jnactifa! ii^u. As < "ai\ t-r 
has puintofl out [1] Shoppiu'd's I'lmwlions an* nuMuly sv^lumutic adjiKtniutds 
whicli eliminate the Idas intiodiiccd by nunipinp:. 'I'ln' valnu- nf tin- iiiima'iijs 
after Sheppard's eorrectirms liave boon applicrl may lie lutikcd U[(iin hh uiilaa^d 
Ki'oupinK estimates of the true moments while the iiiicoriet'tf'd values ermstidite 
biased estimates 

In practice one obtains Isis monmnts fissni a sintth' jtnmpitsK. 'rhe is(ipHea<iiin 
of Sheppard's adjsistments in such a casts dues not necesssirilv re,'!!!! in thi' sni- 
biased estimate heinff closer to the Lnic moinent thisn is the biu'-iHl estimafo ttiid, 
in an a|)pieoiahle pcrccnlaKO of eases, the unhiased e^tiumto is furtlier fimu tin* 
true moiueut than is the hiased e.sliinate. One does no) know when he nttphes 
Shepi)ar(rs adjustments to the results of a .single ufoupiiii; Hludhei m iirtf lie is 
luakmp; a correction in the rip;ht direction. 

Tins situation is not too satisfactory and yet jiraclical iieci'ssjiv tli-mumN 
some method of (trouping Tlie improvement of mudmn (‘alcnlatim,' niacldiies 
tends to pu.sh Rrouping teclmiciucs furtlier into the ImekKroiiml .sinee. in nmiiv 
casc.s, the machines permit the determinalioii of tlienctiml values of the tiiomeiits 
without groupinp; in a rcaaoiialiU; amount of lime. Hut even lime it i*- pos-ilile 
to use groupinp, methods and to get a good estimale of tlii> tine value in a fine 
tion of the time. It is the pui'po.se, of tliis iiipie.r to present .some new ginujiing 
methods which arc u.scful in obtaining much heller mihhw'd estimates finm a 
single grouping than can he obtained with the use of Shepiiuid'.s eoueelxms. 
The.se methods demand additional work hut this- additional work i.s jiislilit^d hv 
the additional precision resulting wlien .sueii precision is de.sired, 

The .spii'it of the new approach, which in one .sense is a getieralizatimi uf the 
earlier appioucli, can lie exprc.ssed very simply though the details nf (he dm elnp- 
ment and the calculational mctliods demami amiilifieatiun. If we IcA .r - the 
true value and a' the grouped value (the value of the eliiss mark of the gioup in 
wliic.h X is), and tlie error, e « tlie differeiiee between tlie true value ami the 
grouped value, then 

(1) 6 = . 1 . — x', X = x' + «, ami x' K! X "" t. 

In the cla.s,sical theory we u.sc S.r" as the biased grouping t'stimutft of ix*. In 
the new iimthods wc ruse w.rx as the. biased estimate or, if we de.sire more pre¬ 
cision, as the hiased estimate. It is tlnrn po.ssilile to eorreet Exr" ® for 

giouping bias and to correct £.U3;' ■■ for grouping bias jvust as wc now correct 
for grou])ing bias. It i.s also possible to ruse the values of EV* and Xrx”' ‘ 

138 



GROUPING METHODS 


139 


in obtaining a better unbiased estimate or to use tlie values of 22 ", 2xx"~', 
and in obtaining a still better unbiased estimate of 22 ’. 

2. Illustration. The relative merits of the conventional method and the 
proposed methods can be shown effectively by means of an illustration. For 
this purpose I have selected the problem u.sed previously [1, 154] in showing the 
variations in grouped results. The power sums rather than the moments are 
used and the origin is taken at a point near the mean so that the relative varia¬ 
tions are as large as possible. If the values of the power sums weic “padded” 
by measurement about zero, the relative variations would not appear as large. 
However, a problem which shows considerable variation, and in the problem 
under consideration the nine unbiased grouping estimates of 2®“ resulting 
from the nine groupings do not even have the same sign, is an appropriate one 
with which to demonstrate the improvements introduced by the new methods. 

The pioblem consists of 244 discrete variates which range in value from 64 
to 155 Carver took a class interval of nine and formed the nine frequency 
distributions which result when class intervals of nine are chosen in all possible 
ways. He computed the values of 2®', Ex'*, Xx'*, ^x'\ for each of the nine 
distributions, corrected each for bias with the use of the Sheppard adjustments, 
and showed that the averages of the nine coirected estimates arc respectively 
the values of 2x, E-x*, 2x*, 2x\ 

In Table I are presented the values of the biased and unbiased grouping 
estimates of 22 , 2.x*, 2.x*, 2*^, which result from the use of (1) 2x'*; (2) 

(3) 2. x *2''“*; (4) 2 x'’, Exx"“*; and (5) 2 x'*, 2.x.x'‘~*, 22 * 2 "“*. The results are 
presented here for comparison only; the details of the computation aie explained 
later. Rows of biased estimates are indicated by B while the rows of unbiased 
estimates are indicated by U. Parentheses are used to indicate entries which, 
while appearing in rows of biased estimates, arc actually unbiased. The exact 
values of 22 ', when they appear, are indicated by underscoring. The Roman 
numerals indicate the different frequency distributions while the grouping 
methods are indicated by the values of (1), (2), (3), (4), (5) above The true 
values are 22 = — 129,22 * = 77,591, 22 * == — 52,005, Ex^ = 69,239,951 where the 
values of 2 used are the values of the original variates decreased by 105. 

The information contained in Table I deserves more than cursory examination. 
Study shows that the estimates resulting from method 2 arc much closer to the 
true values than are the estimates resulting from method 1, etc. Table II is 
presented below in order to facilitate the comparison of the relative amounts of 
grouping error involved in the different methods. The standard deviation of the 
grouping error of the conventional method, method 1, is used as a norm and the 
standard deviations of the grouping errors for the new methods are compared 
with this norm. 

The decline in the size of the error revealed in Table II indicates a decided 
decrease in grouping errors. Grouping method 2 enables one to compute the 
mean exactly and this is always possible when method 2 is applied to discrete 



TABLE I 


Biated and Unbiased Grouping EsUmatcs bg Different Methods 


GroupiDg 

GroupinK 

Method 

Sx i 

-129 

Sx* 

77,591 

i 

1 -52.005 

j 69.2^9951 

I 

l-B 

(-181) 

77,149 

i -134,101 

j 09.06,3,205 

I 

2-B 

(-120) 

(76,503) 

-105,105 

! 08,207,577 

I 

3.B 

— 

(77,591) 

(-77,825) 

68,033,657 

I 

I-U 

-181 

75,522J 

-130,571 

66,023,177 

I 

2.U 

-120 

76,593 

-104,245 

66,735,717 

I 

3.U 

— 

77,891 

-77,825 

67..516,.3S3* 

I 

4-U 

-120 

77,6631 

-49,613 

09,001,817 

I 

5-U 

-120 

1 

77,591 

-62,351 

09,193,.5,37 

II 

l-B 

(-218) 

78,466 

-64,602 

74,519,902 

II 

2-B 

(-129) 

(77,181) 

-52,977 

72,2«1.367 

ii 

3-B 

_ 

(77,591) 

(-52,465) 

70,085,801 

ir 

1-U 

-218 

76,839i 

-50,242 

71,427,194 

II 

2-11 

-129 

77,181 1 

-62,117 

70,752,717 

II 

3-11 

— 

77,591 j 

-52,465 

70,168.527* 

II 

4-ir 

-129 

77,522J 

-62,.307 

08,770,100 

II 

6-11 

-129 

^ 77,501 

-53,066 

00,240,172 

III 

l-B 

(-111) 

77,769 

j 

j 

71,165.4(» 

III 

2.B 

(-129) 

(76,707) 

-17,037 

70,053,0113 

III 

3.B 

— 

(77,691) 

(-36,097) 

69,211.5-15 

III 

l-U 

-Ill 

76,142* 

6,109 

68.100,521 

III 

2 -ir 

-129 

70,797 

-10,177 

08,517,153 

III 

3-ir 

— 

77,691 

-35,097 

6«,097.271j 

III 

4-U 

-129 

77,461* 

-59,409 

08,945,609 

III 

6-II 

-129 

7^69l_ 

-51,291 

09,201,169 

IV 

l-B 

(-139) 

■ 70,747 

-23,311 

74,171,443 

IV 

2-B 

1 

(-129) 

(77,790) 

-34,464 

72,095,064 

IV 

3-B 

— 

(77,591) 

(-44,108) 

70.602,774 

IV 

i-ir 

-139 

78,120* 

-20,631 

71,027,435 

IV 

2-U 

-129 

77,790 

-33,804 

70.639,864 

IV 

3-ir 

— 

77,691 

-44,108 

70,075,5001 

IV 

4-U 

-129 

77,469} 

-69,350 

09,037.611 

IV 

6-II 

-129 

77,691 

-62,243 

09,248,077 

V 

l-B 

(-104) 

81,934 

19,660 

76,143,874 

V 

2-B 

(-129) 

(78,891) 

-4,621 

73,590,207 

V 

3-B 

— 

(77,691) 

(-28,387) 

71,010.053 

V 

V 

1- U 

2- U 

1 1 

80,807* 

78,891 

21,746 

-3,601 

72.912.380 

72,012,387 

V 

3-U 

— 

77,691 

-28,387 

71,092.7705 

V 

4-[r 

-129 

77,474* 

-55,476 

09,142.430 

V 

5-U 

-129 

77,691 

-61,932 

69,312,700 


140 


GROXJPING METHODS 


141 


TABLE I iflonl'd.) 


Grouping 

^K|| 

-129 

77.591 

Zit* 

-52,005 

Ix* 

69,259.951 

VI 

1-5 

(-87) 

80,146 

16,561 

72,467,641 

VI 

2-B 

(-129) 


-4,914 

70,940,124 

VI 

3-B 

— 

(77,691) 

(-27,714) 

69,902,910 

VI 

1-U 

-87 

78,618i 

18,291 

69,307,613 

VI 

2.U 

-129 


-4,064 

69,379,624 

VI 

3-U 

— 

77,691 

-27,714 

69,386,636} 

VI 

4-U 

-129 

77,641? 

-50,424 

69,636,657 

VI 

fi-U 

-129 

77,691 

-61,849 

69,241,607 

VII 

1-B 

(-52) 

80,302 

-36,118 

71,851,930 

VII 

2-B 

(-129) 

(78,188) 

-39,486 

70,616,354 

VII 

3-B 

— 

(77,691) 

(-44,492) 

69,647,462 

VII 

1-U 

-52 

78,676} 

-36,078 

68,685,722 

VII 

2-U 

-129 

78,168 

-38,626 

68,961,994 

VII 

3-U 

— 

77,691 

-44,492 

69,130,188} 

VII 

4-U 

-129 

77,660} 

-48,802 

69,689,930 

VII 

6-U 

-129 

77,691 

-61,136 

69,260,146 

VIII 

1-B 

(-89) 

78,663 

-101,367 

68,426,497 

VIII 

2-B 

(-129) 

(77,362) 

-82,416 

67,969,816 

VIII 

3-B 

— 

(77,691) 

(-66,788) 

67,944,476 

VIII 


-89 

76,926} 

-09,677 

66,330,249 

VIII 


-129 

77,362 

-81,666 

66,412,776 

VIII 


— 

77,691 

-66,788 

67,427,202} 

VIII 


-129 


-47,114 

69,711,437 

VIII 


-129 


-61,473 

69,210,236 

IX 

1-B 

(-180) 

78,894 

-180,792 

73,166,160 

IX 

2-B 

(-129) 

(77,617) 

-134,865 

71,407,737 

IX 

3-B 

— 

(77,691) 

(-92,169) 

70,183,341 

IX 

1-U 

-180 

77,267} 

-177,102 

70,046,262 

IX 

2-U 

-129 

77,617 

-134,006 

69,867,397 

IX 

3-U 

— 

77,691 

-92,169 

69,666,606} 

IX 

4-U 

-129 

77,766} 

-46,691 

69,323,762 

IX 

6-U 

-129 

77,691 

-62,704 

69,246,016 


data. There is also a corresponding decrease in the errors of the higher powers 
to roughly one-half, two-thirds, three-fourths. Greater precision in the case 
of the higher power sums can be obtained with the use of the other methods, 
though these methods demand more calculation. 

There is one more question which should be discussed before the general 





































142 


P. H. mVYEll 


theory is presented, and that deals with the method of compulation of the 
quantities '2xx'‘~^, i:r:r"‘''*, in methods 2 and 3. (’omputalional techniques are 
discussed in a later section of the paper, hut enou(«;li should lie Kh’cn now to 
make the meaning of Xxx""'^ and i’arV"''' clear. In geltiiiK Zx'\ we recall, 
we need only the values of the class mark, x' and the frequmiry awuciated with 
each, /*. . To get the values of we need in addition to x' tlie huin of 

the X values which arc grouped together in the ola.ss having t he same ehusH mark, 
a:'. We denote this value by and we use tliis itrstead of the /a- ttf the usual 
method. In the case of mctliod 3 we record where .r^ i.s the .sum of tin* .squares 
of all a; values having the same grouped value x'. 

Let us examine the first grouping in Table, L The original 241 variati'.s were 
recorded by Carver [1, 154] and he gave the values of /i- for eaeh grouping. 
It is necessary for us to return to these original variate.s, but instead of counting 
the variates in a given group, we add them and we add their squares, 

In obtaining the values for the finsb groujiing in Table I th(‘ variate.s wero- 
transformed with the use of a: = « — 105. The ^’al'iatp.s then ranged from 


TAIiLK U 

Standard Deviations of the Grouping Errors of the Different If ethods Expressed as PercenlagcH 
of the Standard riewalioiur of the f'siiol .t/'f(/iori 


Method 

I* 

1 

Xx^ t 

r,. > 

Zm* 

1 

100 

100 ! 

KX) i 

im 

2 

0 

4».0 { 

i 05,5 1 

74.3 

3 

— 

0 i 

1 32.1 ■ 

49.1 

4 

0 

8.8 j 

7.3 j 

1 13.8 

fi 

0 

0 

9 ! 

i 1.5 


—41 to 50 and the frequency distribution was matle with mid values x' =* —37, 
—28, —19, —10, etc. The values of/*-, , and xj- went then eomputed and 

recorded in the columns, 2, 3, 4, of Table III. The next three; eolumas are 
computational columns useful in obtaining the biased estimate's reeordrsJ at the 
bottom of Table III and also in Table I with the. use of 22 , 


XX'' ^ = 22 Xr-X"' 


22:cV-* = Zx^x'- 


3 . General formulas for corrections for grouping bias. We, art* next led to 
the question of correcting these c.stimatC8 of S.x' for the hiius introduced by 
grouping. Before indicating the numerical work, we derive general formulae 
for correction for grouping bias. 

Wo assume that the variates are recorded in units of h which means that, in 
the case of discrete series, the smallest possible difference between any two 
unequal variates is equal to h. In case the distribution is continuous, the 
recorded values constitute a discrete series recorded in units of h. Thus heights 
may be recorded to the nearest inch, in which case h is one inch, or to the nearest 





GROUPING METHODS 


m 


one hundredth of an inch, in whicli case h is 1/lfX) inch, etc. Wc, iiKsumc 
further that all possible groupings of k different values are made. Thus if the 
smallest variate is a, then the ^"alucs of a: = a, a + /i, a + 2/i, • • • , a + fc + ih, 
■ • ■ , a + fc~Tr"i h are thrown in a group with cla.s.s mark a + — l)/i, Tlte 

k possible sets of groupings of k are made in this way. 

We examine the error involved when a speeifie variate x is replaced by the 
class mark x' in each of these groupings. The values of the lower open limit, L, 
the upper open limit, U, the class mark, x*, and the error e =■ x x' are in¬ 
dicated in Table IV, The k different groupings indicated by the different row.s 
show X at the lower limit, x one step above the lower limit, x two steps above 

TABLE III 


Values of x', ft-, Xx' and x\i for the First Grouping ivilh Computation of Biased EUimates 

ofjtx’’ 



fx/ 



*'• 

xi 

*’< 

63 

1 

so 

2500 

2809 

148,877 

7,890,481 

44 

1 

48 

2301 

1036 

35,181 

3,748.096 

36 

8 

287 

10351 

1225 


1.6(K),026 

26 

16 

402 

10190 

676 

■gH 

456,970 

17 

27 

476 

8515 

280 

4,913 

83,521 

8 

46 

378 

3426 

61 

612 

4,096 


63 

-73 

369 

1 

-1 

1 

-10 

41 

-386 

3924 

100 

-1000 

10,0(X) 

-19 

27 

-507 

9701 

301 

-8859 

130,321 

-28 

12 

-338 

0594 

784 

-21,952 

014,666 

-37 

13 


10717 

1369 

-60,053 

1.874,161 

SFx 

244 

-129 

77591 




Sx'Fx 

-181 

76,693 

-77825 




Sx'^Fx 

77,149 

-105,105 

68,033,657 




Sx'^Fx 

-134,191 

68,267,577 | 




Zx'*Fx 

69,063,265 

1 





the lower limit, etc. It is at once apparent that the errors in replacing x' for x in 
the k different ways constitute the deviations from the mean of the rectangular 
distribution h, 2h, 3/i, • ■ • , kh. We indicate the moments about the mcait of 
this rectangular distribution by Ri, Rt, Ri, and we uw the notation E{t') 
as the sum of the fth powers of the k different «’s divided by k. It follows that 
E{t) = Ri. Now the values of Ri are 0 when I is odd and are well known when 
t is even [2,325]. The ones in which we arc especially interested are 


Ri = 


r - 1 


and Ri 


ik^ - l)(3fc” - 7) 
240 


h\ 


( 2 ) 


12 






















































144 


I*. S. DWVBR 


If an adjustment of scale is made so that the cIifFeiencf‘s bctiveen successive 
class marks are unity, as is customary, the value of h is 1 fk. The values of 
Ri and R{ are then 


(3) 


P _ 1 - 1/k^ 7? = ~ 

12“" ’ «♦ - 240 


As the number of groupings increases tlie value of i/k^ 0 and the appropriate 
values of the moments of the continuous rectangular distribution result. Tliua 


Ri 



Ri 


80’ 


and O/fJ - Ri 


I 

240* 


TABLE IV 


Open Limils, Claee Marks and Errors for the Different Groupings 


Group¬ 

ing 

9 

L - 

0- 

*'-ia + i/) 1 

t ^ 9* 

1 

X ™ L 

X 

x+ {fc-l)A 

X +4(1—1)?» 1 


2 

L + h 

X -- h 

X + (k-2)h 

T +nk~3)h 1 

-Hk-m 

3 

X “■ L + 

X — 2h 

'x4-(fc-3)k 1 

! 

* -Pifk—6)k 1 


... 

... 

... 

1 

1 

1 

... 

i 

X •» L + t—l/i 

x — t—'lh 

I + {k~i)h 

X +}(fc-(2i-l)]/i 


... 


UgM 

... 

! 

i 

... 

k-l 



x + h 

x-}(k-3)k 

HA-3JA 

k 

* - C7 » L + 

X- (fc-l)k 

X 

* ~l(fc—1)A 



If now we let Fi be any real function of z defined for the values ® » o, a + A:, 
a -p 2fc, ■ • • , we have at once the useful lemma 

(4) EjZx* e' F,] = S*"F.F[€‘1 -« iJ.Ux'F. . 

This results from the fact that the values of *, and of all functions of x, are 
unchanged by the groupings even though the values Of x' and « vary. 

The 2 in (4) indicates a summation with respect to the variates while the 
summation with respect to the different errors is taken care of in the E notation. 
The limits of the 2 in (4) are purposely left indefinite so that either a serial 
or a frequency notation can be used. Thus if a serial notation is used, the 
limits are from 1 to AT, the values of x are the variates Xi and F, becomes F,, . 
In this case F,, may be set equal to unity to give the corrections of method 1, 
may be set equal to Xi to give the corrections of method 2, or may be set equal 
























GROUPING METHODS 


145 


to to give the rorrections of method 3. In case the notation of the frequency 
distribution is preferred, the limits of the summation are the smallest variate 
and the largest variate, the values of x arc the values of the different variates 
which occur. In thi.s case we may have Fx = , the frequency function, 

Fx = X/,. - X,. , or Fx = xV.' = 

The continued application of (4) to the terms in the expansion of ElSx'’Fx] 
rc.sults in 


(5) 


= Eiz (x~erFx] = 

= E (-1)' (") Z x‘-‘FxEU‘) = E (-1)' (®) F, E ^‘~‘Fx. 


Tlio fact that Ri = 0 when i is odd may be used in writing out the expansion. 
It is possible to u-ork out a more general theory where the class mark is some 
other value (Rajf tlie smallest variate) rather than the mid-value. In such a 
case formula (5) would apply, but the values of Rt would be the values of the 
momontfi of a rectangular distribution rather than the central moments. The 
above formula is .sufTicicntly general for the purposes of this paper, 

Specific valuc.s of (5) when s = 0, 1, 2, 3, 4 are 

/gs/'g = ^Fx 

El^x'Fx] == SxFx 

( 6 ) ElZx'^Fx] = Sz^x + R^SFx 

i?(2x'’/g - Xx% -f- SRiSxFx 

E[SF%] = Sx^Fx + m^z-Fx + RJ^Fx . 

These equations can be solved for "LFx, 2xFx, etc., in terms of the expected 
values. If we use the inverse operator and write E~\B] = A instead of E[A] 
= B we have 

E-'[lxFx] » Sx'n 

= Sx'V, - RiXFx 
E~'[2x^Fx] = Zz'^Fx - ZRiXx'Fx 

= 2x'V, - Cf2j2x'V, + (OTi:^ - Ri)XFx and in general, 

x>Fx] = E te'-Fx - E (-iyQ)RiE-^[j: 

l-i 

These values JS“'[£x'F*] are unbiased estimates of 2 ,t’Fi since 

E[E-\Sx'Fx]] = ^x'Fx 



140 


I', S. inVYEH 


The corrections for method 1, the cuhtoimiry cnnecfirtiif-, arc ttl.taiticd if a 
serial notation is used with = 1. Tim rornmtitms for Jiu'thotl *2 arc ohtaiimd 
if a serial notation is used with = x. Tlit- rorn'ctions for method .'1 are 
obtain d with Ft = ®‘- Thus we have 



£;-'[Ni = 

jV 



ir*[2x] = 

2x' 


(8) 


2x'’ - 



£r'[2.t’] = 

2x'' - 

SRi 2x' 


ir'[2x'] = 

2x'' - 

fifli2x'' + (Ml 


E~^lXx ] = 

2x 


(9) 


2xx' 


£“'[2x'] = 

2xx'“ - 

~ Ri^x 



2.rx'* - 

- 3/i’j2xx' (*tp., 

and 





£''[2x’“] = 

2x’ 


(10) 


^ 2x*x' 



£T’[2x^] = 

= 2»*x'“ 

- Ka2x*. 


These formulas are the ones used in obtaining the unbiased estimates in jimthods 

tj® j 2 () 

1, 2, 3 from the biased estimates in Table 1. In this ease 

&Rl ~ Ri = = 188, N - 2-14 and the values follow by 

240 

direct substitution in (8), (9), (10) above. 

4. Compound grouping formulas. So far nothing has been said about the 
calculation of the results by methods 4 and 5. These methods might be calletl 
compound grouping methods, since they utilize the biaserl results of more than 
one grouping method. The values of 2x'‘ and 'Sxx'‘~'‘ are nec*ded for method 4 
and the values of 2x'', Xxx"~\ 2iV'“’ for method 6. The forrauifts for method 
4 are first presented. The argument is given in some detail for the value of 
£“'[Sx']. Now 

= 2(a:' + e)’ = 2x'* + 22K't + 

= 2x'* -f 22®'{a! - *0 + £e“ 

so that 

= -2x'® + 22ix' + 2e®. 



GROt’PlNa METHODS 


147 


If the values of e are known, we would have the exact value of 2a;’“ since we know 
‘2iX and * llov^ e\ er, we do not know' these values of e from a single group¬ 
ing, so we diTivp a formula giving unhiused estimates of Sx^ We have at once 

iJtSz") « ==. S[-2x'’ -h 2Sa:x'] + 

and situ’c ,Yfti KlA'Kt) we have 

“ A;(-2x'' + 22:xx' -I- NR^] and 

/? '(Sx®] « - Sx'* + 22ix' + NRt. 

Th('r(‘ is a relativpl.v small error in this estimate since the only error involved 
is the difference between NKt and the actual sum of the squares of the c’s. 
This formula is the basis of the values of ir*[Sx®] recorded in Table I under 
metluKl 4. For examide, in grouping I, the estimate is -77149 -f- 2(76593) -|- 
244(f^) =w 776631 and this differs by only 72f from the exact value. 

In a cnrres],M)nding manner, we may prove 

PJ '(2x*l »=• -2Sx'’ + 32xx'* + 3fljSx 

(12) , , 

I? ‘(Si 1 » ~32.t'* + 4Six'* + 6fi,£r'[Sx’] + ZNRi. 

Different values of R! ‘[Sx’l can be used. In the calculations of Table I the 
valucfl E '(Sx’] « Xxx' from (9) were used, but the values .B"'[Sx“] = — 2x'’ 
+ 22xz' -I- N'Ri could Im used to give somewhat better results, 

It can be shown also that 


iTHXx^l « -42x'‘ 4- 52xx'^ + l0ft»iS"‘[2x’] + 16/?42x; 

^"‘[£**1 -SSx'* 4* GSxx'* 4- 162i5,ir‘[2x*] 4- ‘l5RtE~'[Jix"] + 5I2.N, 


and, after some argument that 


^r'E x'] 
(14) 


-(« - 1)I:x'* + 8 2:x"-'x 


where [Js] indicate® the integer Js or ~ !)• 

It is possible to obtain better unbiased estimates if we use in addition the 
values of 2x*x" In this cose the values of 2x and 2x* are known exactly, 
and after expansion of 2(x' 4- <)', replacement of«by x — x' and of t by (x *- x')’, 
and further reduction, we get 

i?"*[2x‘) - 2x'’ - 32xx'’ + 32xV, 

F“’[2x‘] »» 32x'^ - 82xx'’ + 62xV’ - WRi, 

£r‘[2x*] = eXx'* - 152xx'* + 102iV’ - 16/242x, 

jEr*[2x*] = 102x'“ “ 242xx'' 4- 162iV^ - 45 fl 42 x’ - lOI^fi,, 


( 15 ) 



148 


P. 8. DWTER 


and in general, 

IE 

( 16 ) 


= Hs ~ 1)(8 - 2) E t" - 8(8 - ' 

+ i.(. -1) s*’*'- - '1; (’)(’*' ~ ') !£y-'l. 


Compound formulas involving additional quantitiw sueh m X/x'" *, SxV" *, 
etc., can be worked out by the methods outlined alHtX'o. 


6. Computation^ methods. It haa been shown in oeetiona 3 and 4 how the 
unbiased estimates can be obtained from the biased estimates. It is the purpose 
of the present section to show how these biased estimates can he computed 
efficiently. One method of calculation was shown in Table III, The values 
of , I*. , xl- were computed and recorded, and the resulting power mims 
obtained. This is the most direct means of computation and if the mmiber 
of groups is small and if a modem computing machine equipped with auto¬ 
matic positive and negative multiplication is available, it may ho the, preferred 
method. It should be noted that the valuea of in Table III are obtained 
most easily with the use of a machine which permits the calculation of the 
square with a single key punching operation. 

It is customary to use the devides of subtraction of a conatant (cither a c.entnU 
class mark or the smallest class mark) and division by a constant (aine of the 
class interval) to simplify the computational work. Thu» in Table III we 


could use the transformation d' 


~ (-37) 

■'9 


and compute the values of 


Xd'‘F . If P, is the frequency function, wc have the usual formulas, but If 
Fx is a:,/ or xj-, then the results are terms of the type Xxd" ‘ or It is 


possible to reduce these to equivalent variables by the use of d « * ~ 

so that the values of Xd", Xdd'-\ result. We then correct for bias 

with the use of the formulas of sections 3 and 4 where the power sums of the 
rectangular distribution are computed with h == 1/k. 

Another method which in many cases is preferable to that just described is 
the method of cumulative totals. The valuea of/.., x.., and , are cumulated 
successively for the different values of x' and the values of the biased grouping 
estimates are obtained immediately from the entries in the last few rows. The 
cumulations of Table III are shown in Table V. The entries in the column 
of the highest cumulations of /,., x,., xi-, with the exception of thoM at the 
bottom of the column, need not be recorded. 

It is possible to provide multipliers for these ontri«» by an adaptation of 
a method given in an earlier paper [3], A table of multipliers has a top marginal 
row composed of a, a d- fc, o 4- 2k, etc., and a left marginal column composed of 
k — a, 2k — o, etc. The first row in the table is composed of 1, Jfc — a, (Ifc — o)*, 
(fc - o) etc and the first column of 1, a, a’, a’, etc. Each entry in the table 
18 found by adding the product of the entry above it and the columnar heading 



TABLK V 

Compulalim of Biased EstimaUs I'sing CuiMilalivs Totals 


GROUI’IXG METHODS 


149 









150 


IK H. nWYKH 


to the product of the entry nf tlie left and tlie unv headitiK- I'he niultipliens 
for a = —37 are shown in Taide VI. 

The diagonal terms are the inultiiiliers of the values of a given ruinulatinn. 
Thus the niultii)lier.s of the laittom entries of the efihiniiis f>f !‘;u’h of the tliree 
sets of cumulations of Table V are meces-ively 1; “37, 4(1, 13(151, —3323, 
2116; etc. 

This method is ideally adapted to the use of Hollerith eardN. The information 
i.s punched on the cards tn the nunih(>r (tf jdares desired. The eomputatictnal 
grouping is then aecompliKlved by sortiiiR. Ak an ilkistratinn we take the 


T.VULf-; VI 


MuUiplicrs when a —37 anti k » J1 



.i 



-10 

46 

1 i 

46 

2.110 

97,336 

55 

-37 j 

-3,323 

-222.i>6it 

; -l,1.23fi,C.V» 

64 

73 

1,369 ! 

-50,653 

180,606 

-8,750,110 

j 15,798,051 

! 

i 

82 

1,874,101 


! 

j 


TABI.K VH 
lloUertth Illuntralwn 


X** m a' —4,5 

C(/„) t 

C(*„) 

210 

2 

422 ' 

200 

0 

1,242 

100 

10 

2,017 j 

180 

21 

4.tm 

170 

48 

8,727 

100 

no 

18,923 ' 

160 

250 

40,466 < 

140 

45H 

70,302 i 

130 

719 

105,431 1 

120 

900 

127,990 i 

no 

980 

137,203 1 

100 

999 

139,199 

90 

999 

139,199 

80 

1,000 

139,288 


-I 

4.477,45(1 


Cl^.J 


(i.ElOt 
Il,7;« 
3.1,675 
76,565 
173,4at 
.116,073 
493,771 
613,763 
666. CWK 
67S.2111) 
078.290 
678,8116 


records of the weights of 1000 .students as reported by C!ain'e.r [4) when measured 
to the nearest pound. The value of So: i.s 1.39288 11 )k. and that of Jlr® in 10,092,460 
(lb,)“ and we wi.sh to obtain appro.ximatioas to tho.se vahie.s by grouping. If wo 
let the grouping intervals bo 80-89, 90-99, etc., with clam marks x' « 84.5, 
94.5,104.5. etc., we would find by usual methods itr' = 139,520 Ills, and iSx'* « 
19,760,430 (lb )^ However, it is possible to wire in the three place number x, 
and to get from the same number of groups Zx = 139,288 lbs. and Sxx' = 
19,727,326 (lb.)^ The unbiased value.s for method (1), (2), or (4) can be com¬ 
puted with the appropriate formulas of sections (3) and (4), 




CHOri’lNO METHODS 


151 


Tht* nollciitli mn ii sliown in Ttiblo VII mIici'c the fii'Kt column indicates 
the Himdh'.st vaiiate in the clasM rath(>r than the class mark. The next columns 
Khn\vf''(/j’) and dixx'i. 'Diefoiirth column C(yx') i.s discusKed in a later section. 

The valne.H for method 3 and method 5 cannot he obtained so readilj', since 
the quantitie.s to be grouped are the x* and these do not appear on the card. 
However, it is iiossible to ii.se a multiplying punch or to use a table of squares 
in the form of preptindierl canLs to get the.se values of on the cards. It might 
be prc'ferable, in .‘-onn* case.s, to do this work and then to use a coarser grouping 
than Mould be used otlu'rwihc. 


6. Moments. The formulas (7), (8), (9), (10), give moment formulas if the 
liroper \ allies of Fx are a,«signed. AVe let Vp = and Vpg = - and have, 
in ea.-'C Fx - I/.V in (7) themsual formula.s 
F ‘Iml “ Cl 
F 'I nil ‘'2 ~ Fi 

V ' 1 -t P 

A [m ~ V} ~ .Ici/fj 

I'j '[p^l i'4 ~ Civilti f- (ll/tls — lit)- 


(18) 


If Fx ■■ ■‘r/F we have* 

F '[ml ni 
F '[ml “ *'11 
A' ’[ml = *'21 “ Fi^i 
E' '[ml « e,, - Mivn . 
AAdiile if A, x^/N wc have 


A''[ml = M2 
(19) A'"'[ml = >'12 

A’'^[ml = vn ~ Fifii 

Similar formulas can be written for methods (4) and (5). 

Previous to Carver’s article in 1930 it was assumed that central moments 
could be used in filace of moments in formulas (17) without introducing bias, 
but this article, demtmstratcd that estimates obtained in tins way are slightly 
biased. Thus 

F{h) = F{n ~ ej) == EM - E{n) 

— Hi + Ri — mf*'i) = /la + 722 — [m(’'i) + Ml] 

= ^2 "b 722 ^ yzipi) so that 
A~'[/l2] = i>2 — 722 + M5(>'i) 
and so P 2 — 722 is a biased estimate of . 



152 


P. S. DWVEH 


The general question of unbinfsOfl c'stiruate.s (jf the centml power .Htuns and 
the central moments is one which has l)een studied for tln> rfuiventiomd case by 
Pierce [3] and Craig [5] The more general discuhsion resulting from the intro¬ 
duction of the new methods is one whicli may well be deferred to a later paper. 
It is interesting to note in passing that the estimate of the variance ohtuinwl by 
substituting central momenta for moments in method 12) is not biased since 

JiiPn) ~ /i’lrii “ eiMi) “ ns ~ Mi -- fii • 

It is to be noted tliat the formulas previously used give correct rc-sulta 
when the adjustments are defined to make the power sums and the moments 
rather than the central power fium.s and the central moments unbiased with 
respect to grouping. A sensible rnetliod of procedure in .such a ease i.s to make 
the correction on the power sum a,s soon as it is computed. 

7. Product moments. Correlation. The introduction of additional variaiile.s 
opems up a variety of situations, since each of the varialiles may he grouped in 
different ways. Of these .situations, one is immediately .solved witli the use 
of the formulas of section 3, and that is tlic case when one. of the variables is 
not grouped. Let y be the ungrouped variable and let = y^' be the sum of 
all the values of y having the same a- grouped value, x'. This situation is 
frequently encountered when using Hollerith cards, as it is only necewary to 
wire in the whole variable y and lake totals when the smalle.st value of x in the 
group is attained. Thus in Tabic VII, the, value, of f'Cl/v) oan lie obtained 
simultaneously with the value of C(/*') and Additional cumulation 

C{zx')i C(Wi')i etc., could be obtained at tlie same time. It follovra from Table 
VII that 

= 078.896 

J5"‘r;[.ry] = Sx'y = 9-1,929,322. 

The actual value of Sxp is 94,774,336. 

The general development of the theory of unbiased estimates of product 
moments is too extensive to be inserted here, but a brief outline, might be in¬ 
dicated. We let the grouping errors be e = x - x' and n ^ y — y\ Then the 
generalization of the lemma (4) is 

( 21 ) Z L 2 /' u" Fx G, « ftod Z *“ Z / Gy , 

XV X y 

where is the bth central moment of the rectangular distribution consisting 
of «'s and fton is the dth central moment of the rectangular distribution con¬ 
sisting of T/’s, This is applied in turn to the terms of the expan.slon of 

2x"y'’F,G,. 

For example 

£[2x'y'F*G,] = F2(x - e)(y - 

= 'ZxyFJiy - Riol^yF^Gy - Roi^xFxGy -f , 


(22) 



GROUPING METHODS 


153 


and if - 1 and G'„ = 1, wc have* 

(23) A’fDx'i/'] = so tliat E~\^zy] = Zx'i/. 

If we Um! the customary device of correcting the moments for bias, rather 
than the central moments or the ratio which is the correlation coefficient, we 
have the usual formula for correction of the correlation coefficient in which 
the numerator term is not corrected for bias, but the values in the denominator 
are eorreeted. 

The use of method 2 gives 'Ex'y' and Siy' as unbiased estimates of 2xy. 
It has been iiointed out that these quantities £x'i/ and Sri/' are readily obtained 
u'lien the aetvial ^•alue8 of x and y are punched on Hollerith cards Each is 
in general a better estimate of Sry than is Sr'^/' since one of the values in the 
product, in each ease. Involves no aiiproximation An average of these might 
be taken to obtain a better estimate of 2xy. If the values of Sx' and Sy' are 
also availalile, it is preferable to use the formula 

(24) ir'M = where A., = NXxy - (2x)(2y). 

The 1000 cards of weights and heights were used in this way with the digits 
grouped There resulted (dimensions omitted) 


N = 1,000 
Xx « 139,299 
Xx' = 139,620 
Xxx' = 19,722,326 
Xxy' = 94,848,036 


Xy = 678,896 
Xy' = 679,420 
Xyx' = 94,929,322 
Xyy' = 461,886,052 


which gives £~‘[rj — .4957. The ungrouped 4 place value is .4952. 

For use without Hollerith machines, this method indicates the recording of 
the values of y*- and x„- as well as /*'v' for entry in the correlation chart. 
The generalization of method 4 leads to 


Xxy = X(x' + e)(y' + 77 ) = ~Xx'y' + Xx'y + Xxy' + Xerj 


so that we have 

(26) E~'[Xxy] = -Xx'y' + Xxy' + Xx'y. 

It is to be noted that the quantity Xx'y' is the unbiased estimate of Xxy 
resulting from the usual frequency distribution. This formula can be used 
with formula (11) of section 4 to obtain an estimate of the correlation coef¬ 
ficient. 

The correlation chart application of method 4 demands the triple entry 
/*'!/' ( %') Vx' for each of the squares of the correlation chart. From these 
values it is possible to compute all the entries needed to use method 4. 



1,5-i 1‘. S. 

In general tho values of li[Zx’'t/'‘Fx(iu] f-'ui Iw workid our ^ufli iJit- ifje-att-d 
use of lemma 21. The reader ulm niidei'-taiid- tin* de\elft|»mi-ij}' of .'cetinii-. 
3 and 4 should have httle diflieuUy m writiujj out the fonriuia'- u -ultui^ lieie. 

It .should be pointed out, that, in eases where the lit-f .and seeund ordr-r mo¬ 
ments only arc clo.sired, it is fretiuerdlv advisable lo u\'t»itl pioujiiu}^ hv usiui^ 
modern computinp; machine.^ and, in this way, to eliminate the trouble and the 
errors caused by Kroupinx [(»]. 

8, Conclusion. There are tuldUicmal luiints which miaht 1 m< eniisidered, but 
they would take considerable .spare and tin* presenlutioii is now .•arfiieienlly 
complete to enable one to obtain some |)erspeetive on the piu|>er use of the 
new methods. 

If precision is not needed, the use of the furnuT proupiriK methods i> ttdvK(>d. 
But if additional precision i.s needed, and if the results of a ‘.int'le uroupiug 
only are available, it i.s advi.sed t<t u.se the iiiuver mefliod'i. Method 2 i*- much 
more .sati.sfactorv than rnetliod 1 and, in many ea»-i-’, wdl be .‘•utlieient, but. if 
additional precision i.s demanded, one can use melhorl :i or one of the eoiii* 
pound mothod.s 

In general there are two kinds of groupings. One i’- a reeordi'd grouping, 
and expresses the mciuiures in terms of tlie tmit.'. whieh are deHiual. while the 
second is a computational gioupitig whieh is inlrodiireil fur the piirpoHe of ease 
of computation. Now the recorded grouping, no mailer wlietlier oblniinal from 
discrete or continuous data, is necessarily di.serete, Tints the uiughts, when 
measured, have to be recorded to the iienre.st pound, or the nearest tentli of a 
pound, or to the noare.st hundredth of a pound, ele. The formulas to be applierl 
to the results of computational grouping are the fmiiiula-s for di-crelt- variates. 
If in addition one wishes to correct eontimunis data for the rr'curdrsl grouping, 
he may then apply the usual Sheppard’s correction.s for coiitimioiw data. How¬ 
ever, it is advised to make the recording grouping HUflicienlly thdailed '-n that 
the errors are slight. Thu.s one might record the valuivs of heights to the nearest 
tenth of a pound, but use ten-pound intervals in making ealculatimiH. In this 
case the values when corrected for the compulnliomd grouping (to the nean>ftt 
tenth of a pound) would presumably be .sufficiently [ireeisr* no that the athfitional 
grouping for recording would not be nece.s.saiy. (In many eascH the two group¬ 
ing corrections arc combined in a aingle groujung correction for erintiuuoiis 
data) 

It appears that it Is not sufficiently .sati.sfaelory to continue to rceonl the 
results of grouping in the u.sual form of a ela,ss mark (or ehiss limitH) and ti fre¬ 
quency if the results arc to be u.scd by others. Tlie table .should mclude an 
additional column of z,. and preferably a enhimn of , whore; the x' are the 
computational grouped values and the .r are the mea.siu’ed values recorded to 
a considerable degree of accuracy. The arrangement takes little more space 
than the present frequency distribution, and it can be obtained from the recorded 
values with a reasonable amount of additional work. In the case of correlation 



ftKOri'lNG MCTHODB 


155 


if in that thf* jirwont mm% of frequencies in the correlation chart 
he aiigiiientffl lln? valiim of and |/,^ for each square. In this way it 
iti fKjfiihlr for tliiifsn who may iif^e the distributions later to obtain much better 
estiniftto than wonlil Ik* ixmhhi from the frequency distributions as now 
rivoitlfnl, lliis pint certainly sliDiild be considered by all those who prepare 
tablrt for general ami yet are forced by practical considerations to use some 
sort of grouping in reiwirting the rcstilts* 

iimmm 

}11II. (\ "The fimdOTCnial nstiire and proof o( Sheppard's adjuatments," Amak 
o/M.Snj(,Vnl 7(lW,pp 154-103 

12) J A Pmna, "A lUidy of a univerae of n finila populations with application to moment' 
funrtioii adjiwlnicnli for gfouped data/* Annofs of Malk SlaL, Vol. 11 (1940), 
pfi Sll-XH. 

|21 F. H. Ilwtni,' The cuinpuution of tnorafiUs with the use of cumulative totals," Annals 
, Yol, 9 (IPt, pp. 2S8 303, 

[4j II (' {‘ahvw, AMkrnpmdric Dala, Edwards Biothers, Ann Arbor, Mich,, 1941, 

|5l (\ ('. Craio, on Bheppard's eoritctions," Annals of Mali Slal,, VoL 12 (1941), 
Pit. 339 m 

[0] P. B, Ilffua, "Thr raleiilfttioii of correlation coefficients from ungrouped data with 
tttti4rn calfulatiiig rnneliincs," Jour, of Am. iSlal. Am., Vol. 36 (1940), pp 

m0 

|71 H I J()Kr«, "Tfu< uiM' of grouped measurements,’' Jour, of Am. M. Assn,, Vol 30 
pp, 625 



ON THE CORRECT USE OF BAYES' FORMULA 

By R. V. Mtses 
Harvard VnivcrfiHy 

The problem that we try to aolve by using Bayes’ formula eoiisists iu uiakinR 
an inference from nn observed statistical value ujjon the unkru)\s n value of a 
parameter, and in examining the chance of this inference, being correct. One 
may call this the principle problem of practical statistics or the estimation 
problem, or, as the author put it in German {Rueckschluss-Wahr.^cheinlichkeit) 
problem of inference probability; at any rate wo encounter thi.s kind of problem 
in various forms in almost every branch of statistical investigation. It will he 
convenient to base the following discussion on a eoncreU' (luc.stion in cpiite 
specified form which veil allow us to see clearer the points that are tti be .Htre.-fsed 
in this paper. 


1. The problem. In examining the quality of water supi)lic.s witli respeet 
to the number of bacterios of a certain kind they eontain, a definite proctnlure is 
usually adopted. One takes n = 5 samplc-s out of the water, eaeh sample of 
e.xactly 10 ccm. Then by a certain biological test one finds out wlielher or ntit 
each sample contain.? at least one bacteria of the kind under consideration. The 
number x (zero to five) of positive teaks is the oli.serverl value from which an 
inference is drawn upon the probability 6 for a sample containing at least t)ne 
bacteria. It is assumed that this 6 is connected with the average number X 
of bacterias per 10 ccm by 

(1) 0 = 1- 0 = 0, = 0.03 for X = 1 

according to Poisson’s law. 4 particular question which we want to answer is 
this: What is the chance of being right, if we conclude from the oh.scrved fact 
a; = 0, (in other cases from a; = 1) that 0 lies between 0 and 0i = 0.03 (or X 
between 0 and 1)? 

For a given 0 the probability of getting x, po.sitlve tests out of n te.sts is 
according to Bernoulli’s formula 


( 2 ) P (*| 9 ) » (^) 0*(1 - 0 )"-*. 

The chance of having a 0-value between 0 and 0i when a: positive tests are ob¬ 
served is according to Bayes’ formula 

f*‘ 

/ p(x I 0) dP{8) 

P.(0O ==--- 

I p(x|0)dP(0) 

156 


(3) 



BAYES' FORMULA 


167 


where P{B) is a distribution function, monotonically increasing from 0 to 1 
and usually known as the a priori probabilily. 


2. The apriori. The function P(d) is generally considered as a troublemaker. 
As one uses to call P the a priori probability most people think that it has some¬ 
thing to do with those absurd conceptions of non-empirical, a priori known 
probabilities that cannot be tested by any experiments etc. This cannot be 
strongly, enough refuted. In our particular case the meaning of P(6) is the 
following. Each probability statement refers, as we know, to a certain infinite 
sequence of experiments or trials which form a kollektiv. If we ask for the 
chance Px(8i) of having a fl-value between zero and 6i when a certain x has been 
observed, we have in mind a sequence of trials each consisting of two steps, 
first, picking out one particular water supply, and then testing the number x 
of samples that contain bacillas. Among the first N trials of this kind we shall 
have Ni cases where the 0-value for the water supply picked out lies between 0 
and 6i , then we shall have Nx cases where the number of positive tests is x, 
and finally in a number A'l, of cases both conditions will be fulfilled. The 
chance Px(di) we ask for is then by definition 

(4) P.(fl0 = lim^% 

while the so-called a priori probability is 

(6) P(0i) = Urn 

Later on we shall also use the probability 


( 6 ) 


lim 


All these magnitudes are to the same extent empirical or non-empirical. They 
are ''empirical,” since we get approximate values for them out of a long sequence 
of experiments, and they may be considered as something super-empirical since 
the concepts of an infinite sequence and of a limit are used in the definition—as 
each theory must involve a certain amount of “idealization.” 

In order to avoid the above mentioned equivocation the author had sug¬ 
gested a long time ago* to call the probabilities corresponding to P(0) and Px{d) 
respectively the initial and the final probability. Another expression which 
could be used in connection with the distribution function P(fl) is overall distri¬ 
bution, since it means the distribution of 0-values within the total mass of 
samples, not regarding what the values of x are in each case. 


3. No randomness required. Now, the first remark we have to make is the 
following; In the Bayes* formula (3) the existence of a function P(0) is presup- 


^ Cf. reference [2], p, 162. 



158 


R. V. MISEH 


posed, i.e. we assume that in the sequence of successive trials the freciuency of 
those cases in which d falls into a certain region has a definite limit. But 
nothing is assumed about this limit being independent of a plare aelection. 
The sequence of trials must fulfill the first condition of a kollektiv, with respect 
to 6 but not the second; in other words the randomnesH in ihf surrarn'r/n of 
is not required. Thus we may say that $ is not supposed to be a chance variable 
in the usual sense of this term. Sometimes people are shockefl by the idea that 
in Bayes’ theory the individual cascji arc suppost^d tf) be picked t)ut at random, 
and it is often considered as a superiority of the metliod of confulenee intervals 
that here such assumption is avoided. 

It is true that in the latter method even the existence of the frequency limit is 
not required,® but this does not seem to make any essential difference. The 
fact is that, if we want to make an inference upon the value of & i.e. an assertion 
about the chance of d falling into a certain interval, we have to assume that in 
the long run different (?-values may occur with certain freciuencies. 

It may be useful to have different expressions for the two cases where a fri*- 
quency limit is or is not supposed to be independent- of an arbitrary place 
selection. As we use the word probability in the first caae it Kcema mutable to 
apply the word chance in the second. Thus, if B(e) is the initial or the over all 
chance of e we would say that is the final chance of 6 Iming amaller than 
or equal to for a certain observed x-value. When P(5) is suppcwtKl to Ive a 
probability, i.e. to fulfill the condition of randomness, then P,(ei) will have this 
property too and has to be called probability. 


4, Inequalities for the final chance P,(8). A much better founded objection 
against the practical appUcation of Baye.s’ formula conaists in saying that in 
most cases we haye no sufficient information about the function P(fl). Tins 
undeniable fact leads often to an incorrect simplification of the formula by re« 
placing in it dP(fl) by de which means an a priori probability of constant density. 
It is obvious that this is no solution: if you do not know what P(e) is, to Rmxmt 
it equal to $. On the other hand, if we accept Bayes’ formula as correct (and 
there is no reason for not doing so) we learn that the value we ask for 
depends essentially on P{e), and is undetermined as far as P{9) is undetermined. 
The only consequence in this situation is, first to use all information we can get 
about P(e), and then to make the answer as vague or undetermined as the in¬ 
completeness of this information requires. 

One way to do this consists in setting up inequalities for P,{9) based on 
certain inequalities for P(e). A formula which turns out to be useful, at leant 
m a well-kno^ asymptotic problem is the following; 

Let us consider the general case where 6 stands for several variable parameters, 

interested in the final 

probability Po of a subset C of A given by 

’ Cf. reference [4], p, 201. 



BAYKS’ FORMULA 


159 


(C.) 


[ p(x 1 8) rfP(O) 

n •>((!) 

' f ’ 

I p(x I 8) dP(0) 

J f 


whert’ -r is tn tif* kiuiwii. 

Let P', 1»F tlu' vuUu' of Pc undor the assumption of a constant initial density 
and dfUtitc l)y Ph , Pa the analogous values for a subset B which includes C so 
as to have 


(7) C <B < A. 

The quantities P'a and P'c depend only on the function [ d) and the seta B 
and C while Pa and Pc change with P{6). 

If we ORBunu* that the initial density p{,B) has the limits 


( 8 ) 

it can easily 
(9) 


m g p{0) ^ M 

>n' S ?)(0) ^ 

he shown tliat 

»t , m'n d'n c 

g Pa 4- Pa) S 


within B 

within A ~ B, (A minus B) 


0- ^ ~ Pa+ — (1~ Pa) 
Pc m rn 


We may consider (he following atiplication of these inequalities. 

If we are cotuierruid with a cose where a great number n of trials is involved, 
the function pix \ 6) which deternninea theP' values—shows an increasing con¬ 
centration at a certain point of the set A, In other words, for large n we have 
a subset B more and more reducing to one single point for which Pd is as near 
to 1 as we want. If wc then assume that the density p(8) is continuous and 
bounded, the difference between m and M tends to zero, and if m is supposed to 
have a positive, lower bound, both the first and the last expression in (9) tend 
to unit or Pc approaches P^. This is a generalized form of the statement 
which the author proved for the first time in 1919,’ that in the original Bayes’ 
problem where we are concerned with n repetitive observations of an alterna¬ 
tive, Ihc Jiml probahilily becomes more arid more independeM of the milial proba- 
hility PiO) OH the number n of observations involved increases. 

B, tTsUig previous experience. The inequalities (9) may be of use in many 
cases. But to be .sure, in general, they are not the basis upon which practical 
estimation judgments rest. Everybody acquainted with the conditions of tes^ 
ing water supplies takes it for granted that the outcome a: = 0 (no positive test) 
supplies a suiricient reason for the statement 8 g = 0.63 (less than one 


’ Cf. reference [1], p. 81 



160 


R. V. MIRES 


bacteria per 10 cc). But, if nothing were known about (he initial distribution 
P(6), we could assume P(0) in the form 

P(fl) = fl", p{0) ^ mfl’""' for 0 S S 1, 

with a largo value of m. With n = 5, i = 0 e<iuati(m.s (2) and (3) give Pa(Oi) •=- 
0.60 for m = 10, and Po(0i) =* 0.88 if ?7i is 5. These values are much too low 
to justify any recommendation of a water supply for which r was found to Im 
zero. Thus we have to ask: What is tlie real murcc of the confulcncr we put in 
the inference from a; = 0 upon 0 g fli ? 

There is no doubt, that this confidence is bo.'^ed on previous experience We 
know that the water supplies subjectetl to the routine test in the past formed a 
class of rather clean than dirty wator and we rely that a new sample will belong 
to the same class. The author was given the following information about the 
results under the jurisdiction of Massachusetts during the Inst decaiie. Out 
of a total of IV = 3420 examinations there were found 

3086 cases with k = 0 (no positive, test) 

279 cases with * = 1 (one positive test) 

32 cases with x = 2 

15 cases with x = 3 

6 cases with x =« 4 

3 cases with x = 5 

The overwhelming majority of cases with i = 0 is evident. The question is 
only how we can use these statistics of past experiments for obtaining a nu¬ 
merical inference upon the value of P*(0). 

If the initial distribution P(6) were known, we could find the probability Q* 
of getting X positive tests out of n; 

(10) Q. = p(x 10) dP(0) = jf‘ 0*(1 - er-‘ dP(0). 

Using the numbers Ni, Nz, iVj, introduced in section 2 the probability Q{x) 
is defined by equation (6). 

If the number N of past examinations is considered as sufficiently large, we 
can take the ratios 3086/3420, 279/3420 etc. as approximate values for Qo, 
Q\ etc. Now, according to the well-known identities 

LI; *(:)«■<'-«-“*■ 


( 12 ) 



BAYES’ FORMULA 


161 


and using (10) we can derive from the values Qo ,Qi, • ■ ■ ,Q„ the first and second 
moments of the distribution function P{d). 


(13) 


Ah = f e dP(e) = i i: Q. 

•^0 W ^-*0 

Ah = f d' dP(e) = 

Jo 


S 2:(a: - 1)0*. 

n[n — 1) *„(] 


If we introduce here the above mentioned empirical ratios for Q* we find the 
approximate values for the first and second moments of P{d)\ 


(130 


Ah = 0.02474 Ah = 0.00401. 


6 , Determination of a distribution function by its first moments. In an 
earlier paper the author showed [3] how the exact upper and lower bounds for a 
distribution function P(e) can be found, if the expected values of two functions 
f{6) and fj{6) are known. The only condition was that the curve represented 
in a Cartesian coordinate system by a; = f(6), y = g{e) is convex. Let us take 

f(e) = g{e) = 0 for fl < 0 

(14) f(e) = 6, g{e) = for 0 ^ ^ 1 

f(g) = g(g) = 1 for 0 > 1. 

In this case the condition is fulfilled and the expected values of f{d) and g{9) 
are the moments AIi, ilfj, respectively. The results obtained in the paper 
quoted above take the following form: 

First, we have to derive from the given values Ah and AI 2 two points d' and 
6" of the internal 0 g ^ 1 


(15) 


^ Ah - Ah 


1 - Ml ’ 


qi/ _ 

’ ~sr,- 


Then the limits for P{d) are: 


0 g P{e) g 


Ah - Afi 

Ah-2Ahe + 'e^ 


(16) 1 - d/i - -‘Xh g P{e) g 1 - 

u a — i 


(Ah - 0)" 


Ah-2Ahe + e^ 


g Pie) g 1 


for 0 g e g 
for 0' g g e" 
fore" g 5 g 1. 


In our case we find 6' = 0.0213, 6" = 0.1619 and the point di = 0.6321 falls 
into the third interval 6", 1. The lines 0 .4 B C' and 0 D E P Gin Fig. 1 show, 
(slightly distorted) the lower and upper bounds foi P(e). 



162 


H. V. MI8E8 


P(i^} 



•k«cM 


xzr: 


J 


« *9» 
9 


^■rn 


' i~^x 


Ptr>. 9 

Fia. 1. Tho limiut of iho ovomll dmlribulion function 
Fia. 2. The 99% region in the methods of confidoncti intcrvala 


7. Application to Bayes’ formula. The inequalities (10 enable u« to find in a 
simple way a lower bound for the end probability defined by (2) and (3) 
in the case is = 0. Let us denote by A the numeratof* in (3) and by B the 
supplementary integral 

(17) B ea r p(x| 0) (iP(e), 

so as to have A + B for the denominator in (3). If the aubacripte min and imx 
denote a lower and upper bound respectively we can write 


(18) 


PM 




A + B Auu + Bttux' 

Now, taking a: = 0 we find by product integration 

(19) A = P(fl,)(i - ed" + rt r* P(f>)(l - de, 

Jo 


Therefore, is found when we introduce in this expression the lower lim 
for P(fl) as given in (16). If we do this and use the values for M, and Afi 
according to (13'), numerical computation leads to Amin “ 0.712. 

In the same way we obtain B in the form 


(20) B = -p( 0 ,)(i - ej)" + n f p(e)(i - $y~' dS. 

The upper bound Bm„ is reached, if we introduce in the integral Pie) = 1 and 
m the first term the minimum value for Pie,) following from (16). The second 




BAYEs' FORMULA 


163 


term becomes thus equal to (1 — 0^" and the numerical result is jBm.i = 
0.0000607. Therefore the inequality (18) supphe.s 

(18-) PM i . O.OSOIS. 

The final outcome secured in this way can be formulated as follows; Ij we 
assume lhai in c,oniinuing the experiments the distribution of test results will be 
about the same as it has been in the past 3420 cases, we have a chance of more than 
99.9% of being right, when we state m each case of no positive test that the density 
of bacterias is less than 1 per 10 ccm. 

The high value of 99 9% for P{di) is of cour.se strictly bound to the assump¬ 
tion that the entire mass of water supplies to be tested is homogeneous and 
sufficiently characterized by the distribution of test results found in the past. 
If e.g we had to assume that the six possible values for x (0 to 5) in the long 
iiin appear with equal fiequencies so as to have Qo = Qi = ■ • ■ Qs = the 
same method would give Mi = Mi = then 8' = 8" = f, and the final 

result would be Po{6i) ^ 0.73, The assumption of a constant initial density 
P(fl) = 6 would give Po(fli) = Po(^i) = 0.9975, a little less than the value 
found above in (18'). 


8. The case x = 1. The results are less favorable in the case of one positive 
test, X = 1. Here we have 

(21) p(l 1 8) = n0(l - 8)’'-'- = 59(1 - 8)\ 

and the derivative of p is first positive, then negative. We can conclude from 
Fig. 1 that the minimum value for A and the maximum for B will be reached 
when the distribution function P{8) is represented by the line 0 D I H J G 
where 1 H is horizontal and H the point on B C with abscissa di . The abscissa 
8o of 1 is determined by the equation 

^ ' Mi -■ 2Mi 00 -f- 0J W^2Mi8i +el’ 


which supplies 0o = 0.0190. We then have 


(23) 



p(l I 0) dP{8), 


with the value p(l | 0) from (21) and with 


Pie) = 


Mi - Ml 
Mi - 2Mr0 -f 0“ 


according to (16). On the other hand Pma* is found, as in the former case, to be 
(24) p(i 10i)U - P(0O], 



104 


U. V. MISKS 


where we have to take for Piffi) its taitiiiiiuia value fU'eoriliuK to (16). The 
numerical computation yieUN * (I.OfKt'i ami - (HKHIfi'i wi aa to give 

= n!) 2 . 

The re.Mill, tliat under the u^Miiujilmn almve mentioned wr liner tjittrr 
than 92% cliancr tif heitiQ rifl/it, if we iirmliet eaeli time one out of five tests luut 
been positive that the (len.«ity of iiarilli is le.>"S than 1 per 10 eem The ehmiee 
computed under tlie aasnmption of a uniform initial distribution Pm B 
would be 0.97. 

9. The method of confidence intervels. One may ask what kind of answer 
to our questions can he deduced from tlie i>rineiple of confidence intervals. 
This method has undeniably to its credit that Jin use is made here of the initial 
distribution PiO) and that, therefore, all its htateinents aie eorufiletply inde¬ 
pendent of what IS assumed about P(9). 

In order to apply this method'* we have to .select for a given dcKris* of confi¬ 
dence, say a = 0.99, a region of areeptanee, i.e. an urea in the two dimensional 
X, 6 plane limited liy tno lines xdO) and x^iB) so as to have for each 0 

(25) Prol) l.ri(e) ^ x S x,( 0 )) «, 

The region is, of course, not uniquely determinwl by (25). In tmr case, how¬ 
ever, one will generally agree that the he-st way to determine the region eonaists 
in assuming for xi( 0 ) and a* 3 ( 6 ) two step lines with Kt<‘pa at the integer values 
X = 0, 1, 2 , • • • as indicated in Fig. 2. Then the formula (2) for p(x 1 6) com¬ 
bined with (25) supplies the abscissae of the steps, if x is givexi. If we transform 
the limits for 0 into limits for X using equation (1), the final outcome reads as 
follows: 

Whatever the initial dislnbulion P{B) may be, tee have a chanca of 99% of being 
right, if we predict'. 

each time x = 0 ^s observed that X lies belwem 0 and 0.92, 

each time x = I is observed that X lies between 0.002 ami 1.51, 

each lime x = 2is observed that X lies between 0.036 and 2.24, 

each time x = 3 is observed that X lies between 0.112 and 3.41, 

each time x = 4 is observed that X lies between 0.25 and 8.48, 

each Imo x = 5 is observed that X lies between O.Sl and x .. 

*'^**^*’ a result independent of any assumption 

on P( 0 ). But it is essential that the ehance of « = 99% holds only for the six 
joint staterrients as a whole. This means it may happen that for inEtimcc the 
hist assertion (that X is smaller than 0.92 in the ca.se .r = 0) i.s correct hut ver^' 
seldom or even nev er, while otlier assertions (e.g. those for .r - 4 and 5 ) have 

^ Ct referenoo [51 and roferanoc [4], p. 203. 



165 


bates’ formula 

a much greater chance than 99% of being correct. Whether this happens or 
not depends on the initial distribution P(8 ). As long as we know nothing about 
P(l) we are not m the position to conclude, by using the method of confidence 
intervals, that the particular statement “X ^ 0.92 if x = 0” has a chance of 
99% or even any chance at all of being coirect, On the other hand, when = 0 
has been observed we are in no way interested in consequences that may be 
drawn in the case i = 4 or a: = 5 or in a set of statements that includes the 
cases a: = 4 and a: = 5, The only practical question that is relevant to the 
purpose for which the tests are made is this. Wki can we conclude from the fact 
hi in a certain instance x = Ohas been observed (or m another instance x - 1)? 
It seems that the method of confidence intervals, discarding any consideration 
of the initial distribution, can supply no contribution towards the answering 
this particular question. 


[1] R V Mibes, "Fundamentalsaetze der Wahrschcinliclikeitsreclinuiig," Math. Zeit,, 

Vol 4 (1919), pp. 1-97 

[2] R. V Mises, Wahrsclmlichkilsmhnung und ihre Anwendmg in der SlalisHk und 

Theorelischen Fhijsik, Leipzig mid Wien, 1931 

[3] R. V. Mises, "The limits of a distribution function if two evpected values are given," 

Annals of Math, hi,, Vol, 10 (1939), pp 99-104 

[4] R, V, Mibes, "On the foundation of probability and statistics," Annals of Mali Slat 

Vol. 12 (1941), pp. 191-205. 

[51 J Neymau, Roy Slat Soc. Jour., Vol 97 (1934), pp 590-592 



M ITERATIVE METHOD OF ADJUSTING SAMPLE FREQUENCY TABLES 
WHEN EXPECTED MARGINAL TOTALS ARE KNOWN 

By FitKDi.uirK F. 

Cornfll Vnivtmhj and V. S. liurrau 

I. Introduction. In a pr(>vi(m>> piipcr By W. HduunL DciuiuK and thi* 
author [1] tlip method of least .stiiiares was applinl to tho suljiiftmeni of •'junph- 
frequency tablG.s for wliieh the expeeded values of the marginal total*- are known. 
From ohserY'ations on a sample the fiequeneic'* jo, for tlm eel! in the dh row 
and jth column of a two dimensional table (uid the r row and e<dninii total--, 
n, and n.j, are obtained. These frequeueie- are MiBjeet. to the enor^ of random 
sampling and it is desired to adjust thetn so tliat the row and eolnmn totals 
will agree with their expected values, in,, and m which -,ue kmtwn. The 
adjustment involves the, solution of the r .s - 1 normal cipiatioje-. 

1 ■“ I, 2, ■ ■ - , r 

j 1 . 2 . - 1 

where the X are Lagrange, tntiltiplier.s from which are ealcuhiled the adju-trsl 
frequencies 

(2) m.j = n.Xl + X, h X ,). 

Similar equations arise in the Uiree dimensional eiw. 

A method of iterative proportion.s was pre.sented for cfTeetiiig: the adjustments 
more conveniently than by .solving the nonnul iiiul condition eipmlimis. anti it 
was stated that "the final results coincide with tlie least .squaree solution,’* 
This statement is incorrect, for although the adjusted vahu*.s .sat iafy the comhtion 
equations, they do not satisfy tlic normal (‘(luations and lienee they jirovidn 
only an approximation to the solution; Tiie nmllmd of iterative proportions baa 
several interesting characterustics that will be diaeuaserl in a later section. 
This paper now presents a method that converge.s to tin* values given by the 
least squares adjustment and is seif correcting. It can lie vised witli smy set of 
data and weights for which a least squares soluliou exists. 'I'he t wvwlimensiniial 
case will be considered first. 

2. The two-dimensional case; expected row and column totals known. 
Assume that a sample of n items i.s drawn at random and ero-HH-idimsified in a 
table of T rows and s columns, As in the previous paper, let a,, lie the fret|Ueney 
in the ith row and jtli column of the two-way frequency distribution. Indicate 
summation by substituting a dot for the letter over wdiieh tlui summation is to 
be performed. Then m. and n., arc the marginal totals for the ftli row and 
jth column respectively. Let m,-, and m.j be the expected va.luc.s of these 

166 


( 1 ) 


n.-.X,. + I3n,,X., = 171,, - M,., 

) 

X] n,',X,. -F n.,X, = m.i - ii,, 



ADJUSTING SAMPLE FREQUENCY TABLES 


167 


marj ,inal totals calculated from other information or from theoretical considera¬ 
tions, and c,j a set of constants known or estimated to be proportional to the 
recipiocals of the weights of the n,,, i e proportional to their error variances. 
Since the weights are positive, the Cij are non-negative and finite. It is assumed 
that the set of weights is such that for the given data an adjustment exists. 

The least squares adjusted frequencies m,j can be computed from the given 
numbers c,,, n,j, m,'. , and wi.y by a series of approximate adjustments in a 
manner now to be explained Let be the pth approximation to m,',. In 
conformity with this notation = .n.j . Let 


(3) . = m., 


(b) 


= m, — 


rrii , 


d\^^ = m., — mV 


(p) 


be corrections that must be added to the m^’’^ to produce the least Squares 
adjusted frequencies. As d 0, m^”' —> m. Let and be constants 
determined arbitrarily between the limits set by equations (5) to (7). Any one 
X may be fixed arbitrarily and kept constant through successive approximations. 
Note that XJV = XV’ = 0 and that, if at every step we set X,''/’ = 0, the x'*”^ 
are approximations to the Lagrange multipliers in the normal equations. After 
p steps in the iterative process the approximate adjusted frequencies will be 

(4) mif = n., + ci,(X.'^’ + X'"V 


The following conditions, derived from (19), (23), and (24), are sufficient to 
make the successive approximations converge to the least squares adjusted 
frequencies; 

^(P) ^ ^ ^(P) djp-u/c. ^ 

(6) 0 < d[!‘\ 0 < d^,^\ -f e),”’ < 2,- 


and, for at least one pair ij, 

(7) elfidiry + e^,‘’\d^ry > O; + 0.',"^ < 2. 

The 6’a are introduced because in actual computations the successive approxi¬ 
mations X^”' can only bC calculated to a limited number of digits and because 
the adjustment may progress more rapidly if the computer is permitted to use 
his judgment in determining the approximations as he observes the course of 
previous approximations. 

The process of adjustment is continued until the and d.)"’ becoihe suffi¬ 
ciently small to provide the desired degree of agreement between the adjusted 
and expected row and column totals. 


3, Example. The following example shows the steps in the adjustment for 
a table of 3 rows and 4 columns with 9[f ± “ 1- 



168 


FIlI’.DWnCK V. STKfSUN" 


>J i 

1 

"D 


a 

n 

fi J 


lt> 4> 

m 4 

%» k/ 

, >' 

li > .t 

ij 

X ’/J 

! . 

.MI 

.? 


• 


^! 

.‘‘L 

(2) ; 


(J! < 

Oj 

'6 7 

.* 

SI?, 


11; 


n.’j 

11 

733! 

-- 

1 

7.5 

- 

777 .5 



772 '< 



771 

12 

742G' 

! 

- - 1 

■155 

- 

750.5 0 



71'*t( .1 



7107 

13 

47091 

i 

1 


3.53. 

— 1 

4712 0 



17(1*) e 



1711 

H 

2145 

... 1 
j 


17G 


20.55 8 



.Jt'ii ,1 



201!) 

21 

517 

j 

.. 1 

.52’ 


528 


1 

5,"i '« 



re.) 

22 

1)23 

1 

— , 

95 


«7;i r 



*17''. .1 



iif'.i 

23 

022 

^ 1 

, 

56 


m.T, 



1.1.3 5 



s5tl 

24 

703 

1 

1 

i 

70. 

-- ' 

m.7 



(vn M 



I)>i2 

31 

1 

207 

1 

} 

“**" i 

in: 

^ ' 

2(KI 3 . 



2)il I 



ail 

32 

373 

1 

1 

— f 

.38 


;ifl!) 1 



.372 ;t 



37;j 

33 

337 

1 

1 

SI¬ 


328.7 1 ■ 

-- 


.13! 7 



;i.32 

34 : 

125 

" 

1 

S'.) 


Sill 5 - 



3‘iT 5 



+17 

1 

1507 

1 

1.5011 

-ol 

146, 

- 041 

1506.7^ -5 7; 

■« 0.3l«t 

-■ UMHI 

l.'inU ,5 


r> 

1.501 

.2 

K727 

3SW 

■+•1221 

.5Sa.+ .20Hl 

,HS»S 1 ' 40 !) 

4 (8)15 

f 2»»li.'i 

s,Hn> n 


1 

VMII 

3 

5808 

5037 

-+l!l' 

-115, 

+ 04 3i 

.5080 h; 4 ft 2 

i 01:0 

« l).5(V,< 

.'■+'v( 1 


li 

.DW 

,4 

3273 

3138 

-13.5' 

1 

285; 

1 

- 474; 

3i;«) 0 -10 

- i)i):i.5 

- 477.5 

.51 lit 7 


7 

3i:is 

1. 

15003 

1 160'28 

1 -351 1064; 

- 033 

15051 .5.-23 5 

-,0221 

- 0.551 

I5n;<i) 1 

*> 

1 

1.5028 

2 

2770 

1 2844 

+74] 

273i 

+ .27' 

•As;w 5 ;+13 5 

-+ 011)5 


2k 12 k 

« 1 

"t 

2SI! 

3. 

1342 

! 1303 

-39j 

127j 

i-Hh 

12112.6’+10 4 

+ OHllI 


i;+i2 e 

^ li 

i 

i:io:i 


1 1017{ 
( 

i 19176 

0 

1 

1484 

1 1 

111174 0, +0.1, 


... 

1917.5 ,5 

' \) 

5 

11U75 


Columns (1), (2) and (-1) arc given. Columns {3) and fO) to (III art' caleii- 
lated in succession using equations (3), (d), and (.I), It is nut nm'^sary in 
practice to record the 0's or even determine their values .«inee tite X*'*' may tm 
determined directly at convenient values approximately equal ttt their eorre- 
sponding Xl''” + and The final adjusUKi fre¬ 

quencies given in column (12) are derived hy anotlier rei>etiti()n of the adjust' 
ment process but the amounts involved are so small that they can he calculated 
mentally and the rounded at the same time, 

4, Computing procedure. The computing proeedmt' may 1 h* *401 up in any 
of a number of rvays to meet the preferences of tlie eompuUir and the cliarap- 
teristics of the problem. Ordinarily it ia desirable to make every numlwr 
positive and the procedure as nearly routme as possible. 

For two-dimensional adjustments the following proe.e<lur 0 of eomputiug alter¬ 
nately by columns and by rows is convenient: 

(a) Set up a table of the dj in r rows and « columns. Enter the c<. in the 
s + 1 column, the c,, in the r -f 1 row, and c.. == $2 e,, «= S <*•/ in' tfio coni” 

mon cell. 



ADJUSTING SAMPLE EHEQUENCY TABLES 


169 


(b) Calculate the quantities A, = ) + a and A., ^ (d‘“Vc.j) + o 

and enter them in the s + 2 column and r + 2 row. The constant a is selected 
at some value that yvill make all quantities in the computations positive and 
may be any convenient integer greater than 2 max | \ or 2max | d^j^/c., |. 

(c) Calculate the factors /i,'.''' approximately equal to the At — and enter 
each on its corresponding row in the s + 3 column. Throughout the computa¬ 
tions the fi" are merely X„ -f Ja. 

(d) Take column j and multiply each c,, by its corresponding accumu¬ 
lating the products in the calculating machine. Divide the sum of products 
by c.y, subtract the quotient from A j, and record the difference in the yth 
column on the r 4- 3 row. Repeat for each of the other columns. 

(e) Take row i and multiply each cvyby its corresponding p'/' accumulatmg 
the products in the calculating machine. Divide the sum of products by a., 
subtract the quotient from A,., and record the difference p[f' on the ith row 
in the s -t- 4 column bordering the table on the right. Repeat for each of the 
other rows. 

(f) Repeat steps (d) and (e) alternately until a satisfactory degree-of stability 
is reached in the p., and p,, . Then compute each adjusted frequency as follows: 

(8) = c.Xp!”^ + p.','^ - o) -f Utj, 

taking either pj’’* = Pi'”*’ or p*,”’ = as the case may be. 

(g) The computations may be checked at any step by computing 


(9) 

II 

3-^ 

W- 

— 22 Ot. = ac. 

1 

- Z pi." 

ft 

or 




(10) 

22 Mi."’ c,. = 22 .d{. d. 

i 1 

- 23p.?“”Cj = OC,. 
] 

- Ep.‘," 
2 


(h) At any step a constant may be added to all the pi.'’' and subtracted from 
all the p.^,’’'; this may be necessary to keep the p’s all positive. It has no effect 
on the value of a to be used in (8). 

(i) If it is desired to “inflate” the adjusted frequencies (2 wi,y 

first multiply each na , nj , and n , by the factor 22 i t^en proceed 

trJ 

as above using the products in place of their corresponding n,-,, n,. and n.j . 

(j) If before the iterative process has reached an acceptable adjustment it is 
desired to force a satisfaction of the condition equations, compute; 

(11) = cryCpi”’ -+- P.?’ - a) -h nt, + (d'-’c,; 4- dy'c..)/c„ , 

in which either the d[f or the d.','’' are all zero. 

5. Adjustments in three dimensions. If the sample is cross-tabulated in a 
three-way frequency distribution, there are two cases that are not reducible to 



170 


FKEDKWOK F. M’El'IUM 


two-way (U«bril)utiont5. Than* aiv (IcsiRnatfMl C’asB III luid (’usf t il iti the 
earlier patier (Ij. Tlie [iflju-stinent p(niation‘> are, i(“-pectively, 

m\’;l - n.;k + 1- .\ f’) 

111,';*’ = + x:r -i- x,r), 

subject to conditions on the choice of the X correi'pimtiijig to etpiatimi;* b'l), (li). 
and (7), For Case III, the ccmditiona are that 

(13) 0 < o'!'. 0 < o'!'. 0 < o[V , o:,”' d- o'/' i' a‘V < 2, 

and for at least one. triple ijk, 0'!'fdf^'‘’)® -h O/’W./ ")= + Of'o/'f > 

0 and 0,''’’ 4- 0./' -H O'!' < 2. Similar conditions apply to f iii-Kc VIL 
The computing procedure described in Section 4 cun lie cxtcndisl readily to 
the three-dimensional case. For example, in Case \TI culctihito approxi¬ 
mately equal to (d[°,'/Ciy) + Ja and nfV approximately equal to Wf/V, *) -p l,a. 
Then multiply each c,,/, in the column by its rorresiiondirig I- 

accumulating the products in the calculating nuichme. Divide the sum of thi> 
products by c.,k and subtract the (luoticnt from 4- n. Hecord tlu' 

difference as and repeat the process for every other jk column. Take 
hu? = alj" and repeat for each ik column to obtain //IV ; tlicii lake ju',*’ 

and repeat for each ij column to obtain m!!’ and so on. 'I’lic iinat adjustwl 

frequcncios are 

(14) mj-l’k “ nt,k + c,,/t(p!}'.' + nHC + -- «)• 

6. The general case, Tlie iterative mcthcKl can he cxft'iided readily to 
more than three dimensions and to various systems of condition equations, A 
simple general notation may now be introduced. Lot the cells he numbcicd in 
any order from 1 to I and for the ith cell let n, he the value given hy the .sam])le, 
cf a finite positive constant known or estimated to he inveraifly pro|M)rtionai to 
the weight of n,, iru the least squares adjusted value to be detcrmimal, ni,'*” 
the pth approximation to wi,-, d/' = rn.- - m,'''', and m'"' = n, . A-thumc tliat 
the values m, of certain linear combinations of the m,' are given, i.o. there is a 
system of consistent linear equations of condition numbered in any order, the 
o-th equation being 

'(15) X) h„m, == jn,, 2 > 0, 

I % 

6|V and m, being known a priori. The corresponding lluenr comliinations of the 
Hi and dr define 

(16) ih == 71 Kn,y d/* « ^ 


Let 

(17) 





ADJUSTING SAMPLE FREQUENCY TABLES 


171 


The pth approximation to m, is 

(18) = ?^. 4- c. 22 b,Aa’’\ 

a 

■where 

(19) , X'"’ = 0, 

the and therefore the being arbitrary for a finite number of steps, 
say p', but determined thereafter so that 

(20) 2 E - E c,(E > {d^ry/iCrH), 

i 9 

T being a value of cr, chosen at the pth step, for which {d^y/cc is a maximum 
and H a finite number greater than 1 fixed prior to the first step as large as one 
will. That this condition can be satisfied may be shown by putting = 

1 and 0^”’ = 0 (v r). 

A weighted average of several of the possible selections of 6^’’^ satisfying (20) 
■will also satisfy (20), positive “weights” being assumed Let k added to the 
superscript represent the fcth such selection and let > 0 be a constant for 
“weighting” the fcth selection in the weighted average which may be chosen 
arbitrarily except that 22 = 1. Then, if the fcth selection of 6^/^ is repre¬ 
sented by 0^'’'*’, the weighted averages arc = 22 Substitute 

k 

them in the left-hand side of (20), 

<T \ ^ 

( 21 ) =2EE<>^^'’‘'^9y\dyy/c,- Ec,(EEK«’‘-'‘^ey^dy'^/cy 

tr k t a k 

= Ecc^’“'‘\2Eey\dyy/c,) - Ec.(E Eb.yy^dyycy, 

k V ^ k a 

which by the Cauchy-Schwarz inequality 

> E (2 E (dyy/c;) 

k tr 

- E c.(E yy {E a‘'”*>(E ().■. eyyrVcy} 

% h k <f 

= Ey''‘^\2E9^y\dyy/c, - Ec.{EK9ydyyoy\ 

k a k 0 

> Ey\dyy/yH) = idyy/{cji). 

k 

A simpler and more convenient but somewhat more restrictive condition may 
be derived as a special case of (20). Let = 0 except for a set of one or 



172 


FIlEDKUK'K F STEPHAN 


more c so selected that - 0 for every i and every pair <t' and <t" in the 

set. Then (20) 1iccomc,s 

( 22 ) T. 

c 

Differentiating partially for a maximum with n'^peet to one of the Wi* find 
that this .special case of the condition will be satisfiefl if for one ff in the set, 
say r, such that 

(23) (d'/ -’‘)Vc, > (df” y/iWfi), 
the value of 0*/' is chosen in the range, 

(24) 1/(2VS) < < 2 - l/(2^//7) 

and for every other (t in the set 

(25) 0 < 0‘’’> < 2, 

all 0^”' not in the set being zero. A weighted average of sueh valiuw of 8 will 
satisfy (20) whence (6) and (7) follow. 

In practice values of 0^"’ aatiafyuig (20) may be .selected ccmvenicrUly by the 
following procedure' 

(a) Select a set of ff for at least one of which 0'^'’* aatiRfin-s (23) ami for every 

pair of which = 0. In so far a.g this rastriclion pGrinlt.s choose the <r 

corresponding to the larger values of (di'’~‘')Vc» • 

(b) Determine values for each in the set approxitnaUdy equal to 1. 
Until other values are assigned to them assume all other 0^'’’ »» 0. 

(c) Choose a v not in the set, say p, for which (di'’”‘’)Vc„ is relatively largo 
and select a value for 0^"’ such that 

(26) 0'>'> i -EE {/dj»'““. 

i fftp 

(d) Having changed 0^"' from 0 to a value approximately satisfying (26), 
continue with other a not in the set letting p in (20) represent each in turn. 
The work may be terminated at any stage leaving some »» 0. 

7 Convergence of the adjustment. The condition eriuations may be written 
in the following form 

(27) E?^f.4®-dr, 

i 

os a system of consistent, but not necessarily independent, linear equations. 
They may also bo written as conditions on the rtn . The least squares adjust¬ 
ment minimizes the quadratic form 

(28) - E 



ADJUSTING SAMPUE FREQUENCY TABLES 


173 


subject to the restraints (27) Since the c, are positive, S® is positive definite, 
and therefore a minimum exists and is non-negative. The values of the dl'” 
that minimize while satisfying (27) are m, — n,, the w, being known and 
the 711. being the least squares adjusted values that are to be calculated. 

If r i.s the rank of the matrix 1| |1, then from (15) and (IG) it follow.? that 

r of the dj'” may be expressed as linear functions of the i — r other d™. The 
latter then constitute a set of i — ?• independent variables, The normal 
equations 

(29) c).S‘“Vddr - 0, 

are obtained by dilTerentiating with re.spect to each one of them in turn, 
one eciuation lesulting for each value of h corresponding to a d,- in the set of 
independent variable.?. The normal equations (29) are a system ol t — r 
independent linear eciuations and can be written in the form 

(30) 

\ihore the fir.st summation is over the set of independent variables, and the 
second over tlio d'f' in the ? selected condition equation.?. The right-hand 
terms aio constants. Since a least squares adjustment exists the equations are 
consistent and the rank of the matrix || a.o.) 1| is f — r. Any in the set, 
say di?’, i.s the quotient of two determinants tlie clivi.sor being the determinant 
I ax[h) 1 and the dividend being the determinant obtained by replacing the 
by l:/3 di°’. Consequently each whether in the set or not i.s a 

<f 

linear combination of the d^®’ and the sum of the absolute values of the coeffi¬ 
cients of the di“’ is finite. Therefore 

(31) max I d5“VV^ I < G max | dl^^l-\/Z 1 

where G is (max c,/min c,)^ times the sum of the absolute values of the coef¬ 
ficients of the dj®’ in the linear combination for which such sum is a maximum. 
Fiom (28) 

(32) s'® < t max ((df)Vc.l < GH max {(d‘®V/c,) 
whence 

(33) (d‘®)Vcr S: 5'®V(G^0. 

Consider now the discrepancies 

(34) d'® » mi - = dj”-*' - c. S h, 

9 

between the m,' and the corresponding approximations and write the 
quadratic form 

(35) 


S'"' = E (AY'T/c.. 



17 t 


FllKUKUIt'K K. ^'rrl'HAN 


Fifiin (Ifi), (18), aiiil (81) 

(3B) -l-f.E/wX;"', 


and 

(37) 


f y 


linnet) tile .suhatitutiun of (llli) in (27) merely eiuuige^ ?<)) (<t (/>) in the euper- 
Lsenpts, the new equations lu'iiig eonsistent, liy th'liiiition and I lie etin(“-iionilingr 
of the dl’’* being expressible ns linear fiinetions tif the other t r. Ihirther 
(35) is positive definite and hence, has a miniinuin. in fact snlwtitufing Chi) in 
(28) we find that 


(38) 


a**?'*” ^ £)«“" 
ad'“' ddi^' 


Ddji''’ 






(.S'”' 4- 2E4”'xi”‘ 


r'/.S 

.74'” 


- 0 . 


Hence a least squares solution for the d)”’ exists and it leads by (31) to (he same 
values for the m, as does (he .solutum for the 4“'. Since (he coelfieieuts n,),^ 
and p,(h) and the nunilier G are fuiiction.s of the h„ anti e, they are invariant 
for the auhstitution, C’unsecuK'ntly (30), (31), i32), luul (33) may also he 
written with (p) in place of (0) iii the suptM.seriiits (33) becoming 

(39) {dl^'f/rr > ,X'/[(h). 


From (20), (34), and (35) 

5'”’ = E id\'X/c, 

% 

= E (d'.""“)Vc - 2 E E </!"■” 4” ^"/c. 

i 1 4 

(40) + Ec,(E^'..4”>4'’~‘Vc.)* 

t (T 

='s'”~‘’ ~ 2 E4'”(4'^'’)7c. + E r. (E h.,0'^'4'’“'7c4* 

% 9 ' 

< s'”-'’ - (d'”-")V(c,//), p>p' 

and from (39) 

(41) .S"”’ < - l/Afr'”' 

where 


(42) M - (?7//[. 

Therefore, asp—►oo,^ — p'—+<», —r 0, dl”’ —* 0, m',”’ —* mi and conse¬ 

quently the successive adjusted frequencies obtained by an. iterative process in 



ADJXTSTIN'G SAMPLE EUEQGENCY T VnLF.ii 


175 


■which condition (20) is satisfied converge to tlio adjusted frequencies that an' 
obtained by solving the normal equations 

8. Rate of convergence. The eompulcr is not as much interested in the 
pioof of eonvergence as he is in how rapidly the suei’cssivc adjustments reach a 
satisfactory degree of approximation. Erniations (39) or (41) are of no help 
to him. The adjustment may be made m one .stcii, with every = 1, (a) if 
the condition ctiuation.s aie such that cvoiy = 0, a' a", i.e if the 

adju.stnient can be sejiarated into oiic-dmien.'-ional case.s ivheii redundant condi¬ 
tion ccpiations arc ignored, or (b), in the two and three-dimensional castsi, if 
the c,j or are proportional to the c, and c , oi (o the c, , c , , c,.i. or c,j , 
r, 1 , and c ji respectively. Except in the.se and othei .special cases the rapidity 
of canvergence depends on the as well as on the |1 hi„c, || matiix However, 
it scem.s that one can make very little use of tlic dj*'* to determine the rapidity 
of convergence without actually computing the .succc.ssn e adjustments or making 
some efpiivalent calculation. 

Certain results can bo obtained from the j] ?j„c, H matiix alone Ileturning 
to the two-dimon.siniial ca.se and its notation, coiwidcr the matrix |1 c.;|i and 
define 

(43) Sij = Cij Cl c ,/c, , ~ ^ c I 

Lot the adjustments be made with the rcatriclioii that Ol'’' = 0 and 0)’’' = 1 
when p is even, and = 1 and O.)"’ = 0 when p is odd. Then .if p > 1 

dj''” = - 2 : icijc ,)d!r'' = i: z ieje ,){c,jc; 

f44) ^ L ' ' 

= Z Z («.,/c ,) (5///C/.) 4.”“-’ (/ = 1, 2, • • ■ , r) 

I / 

The sum of the absolute values is 

(45) Z 1 d[!‘^ 1 < Z 1 d\^-^ I < hr Z I dl^' 1 

1 « i 

where 

(40) 5? = Z Z 1 Sw/c.j I 7 ) 

I ) 

y.j being the, greatest of the | 5, ,/c,. | in the jth column. Similarly for p > 2 

(47) Z I d?’ I < hi Z 1 d.',” ■ I < hi Z I dVM 

J J i 

where 

(48) 52 = Z Z U,A, 1 7„ 

4 3 

y,. being the greatest of the | 5,,/c.f\ in the ith column. 



176 


FREDERICK F. STEPHA.X 


Aasume again the conditiona just preceding (-14). Let !i,. f>e the minimum 
c<j/c,j in the ith row. Likewise let y., he the minimum ru/r,. in the jth erdumn. 
Then since Sd'.'’* = Sd',"’ = 0, 

(49) S ( d.'”' I = 2S-"d*,'’' - - 2r ,f/\ 


the -f- and — signs indicating that the last two .‘tumnmtionH are over po.‘>itivc 
and negative values of d^/’ rc.spectively. When p i.s oven, nf eruimu all values 
of d.V’ = 0. 

From (44) 

(50) d$^' = -SM.‘r7c,-=^ 2:~a,id.'r‘’l/r,--E‘c.id'; "1/r,. 

j i t 

= E1 dfr' i A-. ~ 2 E" I d\r'' 1A, 

i 

= 2 E” Ci, 1 d.?"" 1/n., - E c.-, 1 d',” iA.i 

} f 

and by (49) 

(51) I d^"’ 1 < E C, i dfr" !A., ~ K.. E1 d.? ’ 

f » 

Eld7lSEld.‘r‘’l(l-En..). 


i f 


(62) 

Similarly 

(53) E|d,?'|<2:td(.’-‘‘’|(1~ E<'() 

J < J 

Let = 1 — E w.. and In = 1 -■ E •'•r > then 
< ; 

(54) E I dS."* I < fut. E I I < (6,6,)'^‘ E 1 dl?* 

i i t 

Now 6j or h may be greater or less than 6i or f>i but, unlike 6i and Ih , they 
can not exceed unity. Let b* be the lesser of hi and btbt. Then under the 
conditions stated with equation (44) 

(66) S i d'7‘> I < S I d!,^' 1 < h*S i I < b’-^S I d'l> I < [ d,V’ I- 

It follows from (40) that 

gip) ^ ^u+i) ^ ^ (di-Y/c,. + E (d.?')Vc.^ 


( 56 ) 


«E {i:(di7Vc<. + E(d?;TA.a 

l—p I I 

^ E {(El d{?’ DVminc,-. + (E I d.‘*’ |)Vminc,/} 

h-p I f 

< (E 1 di."’ 1 + i dl”-*’ D* ((l/min cc) + (l/mmc.,)}/Cl ~ b*). 



ADJUSTING SAMPLE FREQUENCY TABLES 


177 


The reduction in 5^" in g steps of the iterative process is 


J+17—I 


(57) 


D = iS'”' - = E [E id^u'^Y/c. + E (.d^Yf/c ,] 

A-p t ) 


p+/)-l 


> E [(E) i dl'*' 1)7(r max c.) + (E 1 d','*’ [)V(s max c ,)]. 

A-p 1 ; 


from which, by (55), if <7 > 1 is odd. 


(58) D > \ - (E 1 dY^"^ 1 + I d!” 1)^ (— - + - ^ - ) . 

1 — 0 t \rmaxc, s max c,,/ 


1 - b-* 

The relative decrease in is, therefore, by (56), 


(59) 


D_ 

S(p) 


D 


D + 


> 1 + 


1/min c. + 1/min c ; 


b7b 


— 






+ 


s max c.i 


If the g steps actually have been taken a better lowei limit for the relative 
decrease in 5^”’ may be obtained by computing D from (57) and using (56) 
for Similar equations can be written using bj. 

These results can be shown to be valid for an adjustment in which = 
0.7’ = 1 at the first and any of the subsequent steps They also can be ox- 
tended to the three-dimensional cases but not to tluee-dimcnsional adjustments 
with every 0=1. 


9. Improvement resulting from the adjustment. The least squares adjust¬ 
ment eliminates a portion of the errors of sampling, i.e a jjortion of ■/, from 
the set of frequencies estimated from the sample. In fact any adjustment that 
satisfies the condition equations does this. 

Let a be the error in the fth value given by the sample and the error in 
the pth approximation to the least squares adjusted value. Then 

(GO) 5!"’= «> + c.E5.'.X7', 

and 

(61) E (s['’07c. = E Y/c. + 2 E - E c. (E o, xYy. 

t I <r 1 (T 

« 

The complete adju.stment make.s 5^’ vanish and therefore, since the last term is 
non-negative, < ^Y/ci except in the trivial case in which all d'"’ = 0. 

From (37) 

(62) E (s['7Vc. = E *7c.- -f- E X, (dY‘ - d'"’). 

V i 9 

The last term may be computed readilj'- at any stage in the iteration. If the 
.sampling is at landom, A- Se7ci is di.stributed approximately as % with t — 1 
degrees of freedom, where k is the ratio of the c, to the corresponding erro.r 



178 


HiLDI.UK K I. 


variance nf tho n,. Thorcffiu* it \utiilii >-triii 

k IK>C\ the raliH'tion in \\ n> a niKiMiif nf tin- ut at in. u-1 in 

the liiial adjufitment. 

10. The method of iterative proportions, Thr ((endive ne ih'Hl 

described in the earlier paiier (Ij iniidieitly (letine*', in (he (we duneji-ettcil ea^e, 

(08) Wi, - p. n . 

the /!,. and /i, heirif^ ^iven liy the r d- -v ettndition et{i!a(iitns 

(6d) DU. n „, ; -' E (J. 4 n.j, 

i » 

any r + s - 1 of which coriatitiite a e(in*'Kteii( ^y•.(r■In nf iiidnpeiidi’ii*^ e.jnnfiitiH 
in )' + s unknowns, One miiltiiilier, ‘'tiy ;ii , niav be lived ailatt.inlv, Tlii'n 
for a 2 X s tabl'^ it i.s ncee.'VHary to .Mtlve an {‘(piutinn nf tin* 'tth dcf'ree, If -t 2, 
there is only one acceptable .solution, Riven by the pn.diive i(»*t; if8, (hen- 
is only one .solution of the cubic for which all the adjusted ft(’(]i!eiii‘ie-« aie umd- 
negative. For 8X3 and larger tallies the ariiiistnierit apiani' te involve the 
solution of equations of the tenth or higher degiee and there i*' then no choice 
but to use methods of approximation. 

The adjusted frequencies given by the niclliod of derative pioiioitions are not 
identical to tliose given by the method of least .‘■quareK, When flic iidjuft* 
meats are small relative to the frequencies udjustetl, howev ei, the results given 
by this method approximate tiiose. of least squares, For t!ii‘ iwiodinu'iiHtuual 
case the successive adjustments convergi* to a set of frequeneieH that sati'.fy the 
condition equations The author has not foiiiul a proof of eoiivergeuie or 
divergence for more than two dimen.sion.s. 

I wish to express my appreciation of many stimulating eonverMdion- with 
Dr. W. Edwards Deming on this and related problemH, and of the helpful 
critical reading of certain portion.? of the manu.^teript hy Dr. Joseph F, Daly. 

kkfeiiioxct: 

[1] W EDWARD.S Deming and FHhDKiiirK K Stechan, "Ou a U'list ttfnitires adjieuiiu-iq rd 
a sampliid frequency tahle when ihc expected marginal tfitnirt are tiniwn,'' 
Amals oj Math Stnl , Vnl. II (l!)K)), p 427, 



ESTIMATION OF VOLUME IN TIMBER STANDS BY STRIP SAMPLING 

By a. a. Hasel 

California Forest and Range Experiment Station^ 

1. Introduction. The present paper is the second of a proposed, series, in. 
which it is intended to present a systematic study of the properties of several 
methods of sampling timber stands and statistical treatments of the samples 

The effects of size, shape, and arrangement of sampling units on the accuracy 
of sample estimates of timber stand volume were reported in the earlier paper [1] 
for 5,760 acres of the Blacks Mountain Experimental Forest. With complete 
inventory data, the nature of stand variation was shown to be such that 2.S-acre 
plots, the smallest size tested, were more efficient sampling units than larger 
plots, i.e., for a given intensity of sampling the sampling error was smaller. 
Long, narrow plots were more efficient than square plots of the same size. 
Line-plot sampling units consisting of two or more equally spaced plots along 
lines of fixed length were as efficient as single-plot sampling units and more 
efficient than strips consisting of plots contiguous end to end. Improvement in 
the accuracy of estimates was obtained by subdividing the area into rectangular 
blocks of equal size, and sampling each block to the same intensity. By sys¬ 
tematic sampling, whereby the center lines of parallel line-plot or strip sampling 
units wore spaced equidistant, the sample estimates of stand volume were im¬ 
proved over estimates from comparable random samples. Treatment of the 
volumes on individual plots of systematic samples as random sampling observa¬ 
tions, however, as is sometimes done in practice, was shown to give seriously 
biased estimates of sampling error 

In the present paper we shall be concerned with sample estimates from strip 
samples taken within blocks of irregular shape, and consequently with sampling 
units which vary in length mthin samples The methods will be equally 
applicable to line plot samples. 

Following the general ideas expressed by Neyman [2] it is felt that, (1) If the 
formulae of the theory of probability have to be applied at all to the treatment 
of samples, the theoretical model of sampling must involve some element of 
randomness (2) This element of randomness may conveniently be introduced 
by a random selection of the sample, but may also be assumed present in the 
distribution of deviations of timber stand volumes in the area sampled from a 
postulated pattern. (3) Many attempts to treat systematic arrangements 
statistically are faulty because the treatment consists in applying.to systematic 
arrangements formulae that are deduced under the assumption of randomness. 
If the arrangement of sampling is a systematic one, and random errors are 


' Maintained by the U. S Department of .\griculture nt Berkeley, m cooperation with 
the University of California. 


179 



ISO 


A. 


afcciiliccl to XaUiic. tlien the treatment of the data ‘•Imiild lie Oa-ed on rnnnidae 
dcdiieed under explicit n^^umpthm of the .\\M,.iii:itie arraiiEemeiif nf ‘.amplinE 
and of some landoni element in tlie mateual. An example of llu‘- kind of treat¬ 
ment is provided by Xeyman’s mefluid of parahoiie eurxaM j2j devis'd for the 
ti'catmcnt of systematically arraiiKed aErienltnral experinienf-. < t) laislly, a 
mathematical treatment of any puietieal iiuiblem is useful only if the prislietkms 
of the theory arc in satisfactory agreement with the emjhrieal facts. Whether 
the method of .s-amijling is lancloni or .sy.stenuitie, the inatheinatieal thcany of 
sampling always involves certain elements that lue postulated, either in respect 
to the method of sampling itself or in resiieet to the material .saniided, 'Fo have 
a rea.sonablc certainty tliat a iiartioailar mathematical treatment is u.sefii! in 
practice it is necessary to make empirical .sliulie.s to find out whether the dev ia- 
tions from po.stnlatc.s of the theory that may occur in actual sitnatinn.s do or do 
not seriously affect tlic validity of the jircdietioiis. 

2. Notation and definitions. Before pioeeeding to the main siilijeet of this 
paper it may be useful to exiilain the meaning of emdain statistical terms and 
symbols, Ifollcnving Xeyrnan, a .sharp distinction i.s made between three differ¬ 
ent conceptions that are frequently confused by the 3 >raeticul stalislieiau. 

Dufinition 1: If iq . 1 / 2 , ■ • • , Uv are any fixed numbers, whether pr/ivided 
by some already completed experiment involving raudomnes^, or jii.st arbi¬ 
trarily selected, Karl Pearson’s term ".standard deviation" nf the-.e nviinbers 
and the letter S will be u.sed to denote the exiiressicm *S' \/£(», --- iZp/A' 

in which H — Zu,/N is the mean of the u’.s. 

Now let X denote a random variable, that is a variable the value of which is 
going to be determined b}-' a chance experiment. TIuir A' may he the timber 
volume on a strip tliat is going to be selected at rarulom from an area. Denote 
by E{X) the matheniatieal expectation of variable X capable of poh,seHshig values 
Ui, Us, ■ ■ • , u„. Then 

E{X) = Mipji -b utpi 4- ... -b UnP „, 

in which the p’s are the respective piobabilities of all possible different values 
of X. 

Definition 2: The words “standard error of A'^" and the letter o-, will be used 
to denote the expression 

<r* = VMX- E{Xff. 

It will bo noticed that the standard error of a random variable X may have 
it.s value equal to the standard deviation of some numbem w but that this does 
not mean that the two conceptions arc identical or even similar. The E(X), 
and consequently <rx, can be calculated only when the probability law of X is 
known, and are con.stant for the population from which sampler are drawn. 
On the other hand, <3 can be calculated for any sample of the population and 
changes m value from one sample of w’s to another, 



VOLUME IN TIMBER STANDS 


181 


Before proceeding to the third conception, that of an estimate of the standard 
error, which is occasionally confused with the standard deviation or the standard 
error, the unbiased estimate of a parameter must be defined [3]. 

Consider a set of n random variables Xi, Xj, • • • , X„ . These may be, for 
example, the volumes of timber to be observed on n strips that are going to be 
selected from some area by one random method or another. Denote by fl a 
parameter involved in the probability law of the X's. For example, 6 may be 
the total volume of timber in the area. 

Let F be any function of the X’s. 

Definition 3: If it happens that the mathematical expectation of F is 
identically equal to d, then it will be said that F is an unbiased estimate of 6. 

Usually there will be an infinity of unbiased estimates of a parameter d. 
They may be classified by the nature of the function F. Thus linear estimates 
may be considered such that 

F = Xo + XiXi + • • ■ + X„Xn 

in which the X’s stand for some fixed numbers. 

Definition 4: It will be said that a linear unbiased estimate of 6 is the best 
linear unbiased estimate (B. L. U. E.) if its standard error is smaller than or, 
at moat, equal to that of any other linear unbiased estimate. 

It happens frequently that, while it is possible to determine the best linear 
unbiased estimate F of a parameter, it is not possible to calculate the value of 
its standard error, <r, . For this purpose it would be necessary to know the 
whole population sampled. In such cases an unbiased estimate of the square, 
tr* , is calculated. An unbiased estimate of the square of the standard error 
of F will be denoted by . This is the third of the conceptions mentioned 
above. 

The reason for the extensive use of the linear unbiased estimates and of their 
standard errors considered as measures of accuracy is the so-called Theorem of 
Liapounoff. Its content can roughly be explained as follows; If the variables 
Xi, X 2 , • ■ • , Xb are independent and the number n not too small, then the 
probability that F — 6 will exceed a fixed multiple of o-, is approximately equal 
to the probability as determined by the normal law. The above conclusion 
remains true whatever the probability distribution of the X's that is likely to 
be met in practice and also in certain cases where the X’s are mutually dependent, 
for example, when they are determined by sampling a finite population without 
replacement [4]. 

The above conclusions do not apply to estimates that are biased in the sense 
of the above definition. . Also the standard error of such an estimate would not 
be a satisfactory measure of its accuracy. 

3. Description of data. Complete inventory data from the Blacks Mountain 
Experimental Forest, located in the Lassen National Forest, provide suitable 
material for testing the applicability of sampling theory to timber cruising. 



182 


A. A. MA8EL 


The timber is a virgin, all-aged stand, clasawl as pure pine type, with mnrc 
than 90 per cent of the volume in ponderosa pme and Jeffrey pine. of 

the volume is in over-mature trees, i.e., treas over 300 years in age. The stand 
is considered to be fairly rcpre.sentative of the medium and the iioor site fiualitics 
of the northeastern California plateau. 

With the exception of a few localities, all of the area wfia mapped as of uniform 
timber type according to the .standards commonly used. Being fairly uniform 
also with respect to site quality, it may therefore be considered as a single 
stratum. Variability of stand volume from place to place within a stratum may 
be generally expected to be leas, on the average than variability between places in 
different strata, Likewse, within a stratum, variability within compaot aub- 
divisions may be expected to be less than average variability within the whole. 
Heterogeneity can therefore be controlled somewhat by suhdivuling the stratum 
into blocks and treating each block as a separate population. 

More frequently than not, in practice, volume estimates are needed both for 
the total timbered area and for separate working units or compartments within 
the area. In general, working unit boundaries are defined by roads, ridge tops, 
drainage channels, and re,gular land subdivision lines. Thaw working units 
can be taken conveniently as blocks, or if large enough, may he subdivided into 
two or more blocks. Such is the basis used for subdividing the area in the 
present study. 

The complete inventory data for these blocks am given in Table I. All the 
strips are 2^ chains in width and extend in an east-west direction. The length, 
X, is given in 10-chain units, and the volume, Y, is given in units of 1,000 feet 
board measure. 

4. Method of estimation based on correlation between volume and strip 
length. The usual practice in sampling timber stand volume is to take measure¬ 
ments on plots or strips that are either regularly spaced or selected at random 
from all possible plots or strips within blocks. Oftener than not blocks are 
irregular in shape, and the number' of plots along lines or the lengths of strips 
will vary. This variation introduces the matter of proper "weighting” in calcu¬ 
lating sample statistics. Such is the case in 16 of the 20 Blacks Mountain blocks, 

If we let F< represent the volume on the ith strip of length Xi , with length 
expressed say in 10-ohain units, and assume that the entire block eontains a 
population of N strips, then the average volume to the unit of strip is /3 » 

^ S . It is obvious that, if X( is known, and this is assumed to be 

true, the problem of estimating /3 is equivalent with that of estimating tlve total 
volume. The usual procedure of estimating is thin; 

Out of the N strips within the block a sample of n is taken, pving n pairs of 
numbers selected out of the X’b and F's. Let us denote them by 


^i>vi ;xi,y3; , y„. 



VOLUME IN TIMBER STANDS 


183 


The ratio b = ^ ijt / ^ x, , is then considered an estimate of /3, so that the 

JV 

estimate of the total volume in the block is b ^ X, . 

T—1 

Our purpose now will be to study the above estimate h from the point of view 
of unbiasedness In this paper it is assumed that the sampling of strips is 
purely random. To find out whether b is an unbiased estimate or not, its 
expectation must be calculated. This will be done in two steps. To begin 
with assume that the values Xi ,X 2 , ■ ,x„ are chosen in one way or another and 
fixed. The value of b will then depend on the y’s only. It is possible that to a 
given value of x, say xi, there will correspond just one value of ?/i in the block, 
but generally there will be several strips of the same length with varying 
volumes of timber The selection of any strip of this group to be included in 
the sample ivill keep the denominator of b constant, but will cause some variation 
in the numeiator The expectation of b calculated under the assumption that 
the x’s arc fixed is 


(1) X(J} \xi, X2, ••• , Xn) = s 1 3 :.) / Z) J.. , 

in whicli /i'(2/i | r,) denotes the expectation of i/, calculated under the assump¬ 
tion that a:, has a fixed value Obviously Siy^ j a:,) will be what is called the 
regression function of y on x, or of volume on the length of strip. 

It is safe to say that the graph of E{y j x) would almost always be rather 
inegular. On the other hand, it is known that the substitution of smooth curves 
representing the iegressions for the true irregular polygons frequently gives 
results that are surprisingly accurate Therefoie it would not be unreasonable 
to use the assumption that E{y [ x) can be lepresented by a polynomial of some 
moderate degree, 

E{y I .x) = Aa + Aix -|- idjx^ “h ■ • ■ H“ A,x'. 

Substitution of this expression m (1) gives 


Ti n 



• , ^n) „ + Al + A2 + ■ 


E(b 1 xi, X2, • ■ 



X, Z 

Xtsal 


But this is the conditional cxpectatKin of h, calculated under the assumption of 
fixed x’s, is only an intermediate stage in the calculations. We need an absolute 
expectation, calculated under the as.sumption that the x’s are selected at random. 
This gives 


( 2 ) 


E{b) = AoE 





“h Ai A2E 


+ ■■ + A,E 



Complete inventory data for IS blocks of ihe Blacks Mountain Experimental Forest 

Block number 


184 


A. A. 



C 0 C^l^Ol^r'-h-i-«t 5 J'-»QC 0 Q 0 * 50 lO 


I COW«f^'rt<U3if3ir3t£JCCll'-t-aOOOOOX 


«tOOtOW5»-»CO 

^1 'sfMMOrtcofpQt’^r^oco'Tj* 
C'lWrtCOCOCOCO^WC^^Ci*-* 


H r'h>-t-l‘^Q000CO0C00«O-n‘Ol* 


F-lts...HOOO-^05^^0C^O-^»OCOtf^ 


c:5c;c5oc5c>oo)cvooo>cooooooo 


j "^loco-sooo*^—^t'^tOTOOO 
I ^Ot 0 fs^C 0 t 0 l 0 ^t> 30 '^«* 5 Q €*5 

1 04 «0 — O ^ O C<t-r CO "t; 3 3« 

i i-ir-ifc?5coW'^'«t"n''4F5C‘>«co 


ioio<£)i«-t-«.h-oooc7»C50ia50 


CC(MO<OOI'*'-^CCCO«^'^«tOOCO«3 

s:sgs§gSN3!i?ssj3c!:s:2?i 


t-b- b- tN. i>. I'* r-t** CO <<5 


I I cOb>l'>-Tt<^CO(N(C>OCO'-f«COC4a3u)QO 

1 ?>' $f2SS(SSSS2IoSSSoiS3 





Total . IS) 8,773.2 142 6,755 4 76 1,783.8 104 2.425 3 103 4,654.7 140 I 5,522.2 81 .,3,676 5 31 4,366 4 











VOLUME IN TIMBEK STANDS 


185 


I 

a 

I 

pq 



o 

CM 


oceot^t^>-f'^OVflwoi^5oooow5t.-eoi-HO‘Ooc^'—fcooorHt'wcoiO'^u^coiriio 

•^^OCC>CfiC<l<M»ONOOC<l’^^0«Meqi^COi-HC^lOCT)C3eQtDb-OOtrt'-HM'«flO 
ooc^coiOThcqi'-oco-^i'»coeocqiococoeDr'050st^coioi— «Dt^eocO'c}HO 

1-1 1-1 rH 

48 j 1.877 6 

H 

C<lCqrHi-Hj-»i-li-Hi-trHi-(t-lrHi-iT-Hi-Ci“ti-HrH05cqNCStMNNfqC^CqC^<NCSlC<l 

On 


TtH|>.COOi-ii-<0'^OeO^<DOQCOCOlftOOOOCOCiO 

-}lCOCOt^O«McOY-li-H'(tii-<b-OOt^MN.CDCOr-lOt^C5 
COi?'C<3iOCOt^Tj<ooOC5'^OOpHC^CiCqcOi-H -^OODO 

2,053 6 1 

H 

cqcqwcseceocOfococcrtCOCocococococoeowMN 

g 

M 


Oi-HO^’^cooocqt>iO’*^<ciooo 

qp^Oupcqi-HOO'ifi-tr'-oOOOOCQ 

oQQ»-<ooocioOi-'’'¥t^cqu5r'-c5i-t 

eqccc’OtNcqMcqcNf-Hi-^ 

2,703.1 i 

H 

ClQQOQOOOOOCiOt>C-COiO(M(Ni-Ci-H 

Oi 

!>. 

NH 


t'T-<C^t^l>00’'+<C5'^':OTt<COO«DC<iiOi-tCCJlftcDO(MCJrHOO‘AO^ 

QOOtC)^»OQOu5C<lf-lCQi-tOC<JCOCO»-HOo6*-iOCOeQC005COtO 

or^t'*co-j<rH-^cf5ocqtDo>f^»-^Qr^Q$>ocoirti'-'ct;cDT#<»i5co 

ci5r-<c<iCTgcq(N<N(N«c^(Ncqeqcoo5cqc5csc4c^cqr-iWrtrH 

Oi 

00 

SS 

CD* 

H 


! 178 


?S 

T-li-l«»r#i«-tf^t>iCq-5hOOC»CO<D<Ort<'nJ^t^<;DOOitCOeDir3e<lCOiC 

'^oooo6v5<Ncoc^t^-'teoopcoco*^?q?ooooor^c?5CT'"+'-H5?cq»-H 

obo-ct?^'-‘«0'-t»-HOQlAQcSOC»5obOOi-^CCCCOOC>-tM100t^CO 

i-irHC<icooocqcqcoc5c'ifoeocQc5c<icocqc<icq»Hi-(r-ir-tr-ir-i 

6,212 0 * 

H 


165 

ro 


T—(lOO-^«OOOOOOCOi-IOiCO»OkOCOr-i—ICOIOOIN 

i-HCOOCOQiflUttiO^i-tcOCOi-lOOeOOCOQI^CqQeqiM-^THl:^ 
<NWi^<'^wbO'*fkOCq«0»-*'^005l>»-HCn»om?D^THTt<Cqb-CO 
^ 1—1 t-H »—( cq I—* ^ 1—• cq 1—(*—1 i-H cq»— * w ^ ^ 

3,861 1 

H 

rHr-tcqe0'^*i^^iOi0U^iOiOlOiCW5M5iQiO»f3lOiOiO»O»O»O?DcO 

117 

ts 


i-iiraQOociOor^irtOOt>.oo»r5rHO>ooiot^oOr-ioco-^i-ioooooo 

eoo6r*-0'»t<i-*^^‘*!i<oocq-^cQi-iC9U^u5>-rftupopoQCOi-Hp«t» 

C<l''t’e»5f5u50&iu5CO-Hi--QU5u5^0MTt|u5’^^TfiiSOa>iO^-^ 

1-H CO CO ^'sf lo lo ic ^ eo CO IQ cs w w cs CN cq f5 i-h 

7,908 4 

H 

t-«i-H'*fOOOCq^lOlQlQlQO>05Q>05COtOlQlQlQ'^’>^’'!t<Cqi?qiHiH 

1 —1 •—1 rH I-H T-1 

T6li 

Strip number 


Total.: 


5 Numbered in order from north to south within blocks 




18G 


A. A. H\SKt- 


The value of has the foim of (2), except that instead of 'i'r,) it eoufaitis 

SXr/SZi. Since in general the former tloo'- not ueee^'-avtly eciual tlie lattia, 
for the unbiasedness of h it is necessary and sufficient that .!,( -bi ^ 

As = 0 This condition implies that the regression line of u on j- is a straight 
line and passes through the origin of coordinates, 

(3) /J( 2 /! r) = A,x. 

Whether (3) is satisfied is a iniestion of fact and can be aiithoiitattvely an¬ 
swered only by direct studies of regressions on some extensive inventory ilata. 
It may be noted also that in order to iiresiirnc that (.‘f) is tmiallij satisfied, it 
should be established for a large miinhi'r of areas. On the eontiary. if a study 
of only a few areas shows that (3) is not true, then it would not lio wise to take 
it for granted when attempting to make a sampling inventory of an un¬ 
familiar area. 

To investigate this point, linear regrcs.sion eiinations of volume on the length 
of strip were calculated for 15 hloeks of the Rlaeks Mouutaiu Ksperimental 
Forest and it was found that th(« constant terms were hotli positive ami negative' 
with their absolute values varying from 12 to 077. ''I’he (‘onelusiou drawn is 
that the usual estiiunte b of llie average v'ohune per unit of strip is likely tu be 
biased and that theie, is justification in looking for an alternative method lending 
to unbiased esliinates, 

6, Best linear unbiased estimate of volume, based on the linear regression of 
volume on length of strip. In this .seetiou will be suggested a metliod of es¬ 
timating the total voUime, say 0, of a limber stand, which eoultl be conddered 
as an improvement on tlie one ennsideretl above. The, new method vi insists nl 
using a linear uiiliiascd estimate of 0. In order to dudiiee tlie fmin of Ibis 
estimate, certain assiimiitions Inive to lie taken for granted eoueenting tlie 
timber stands to be sampled, mid if it happen.s that these assumptions are 
unsatisfied in a particular case, the new estimate will not neee.ssaiily possfi-v! the 
dcsiicd property of unbiasedness 

In deducing .the estimate F it will lie assumed that the timber sland to bi' 
sampled satisfies the following conditions- (1) That the reKre,s,sion of timlier 
volume on length of strip, A', be (api)ro\imately) linear and (2) that the vari¬ 
ability of the T'b for a lixed A' is precisely known. It will not be assumtsl, 
however, that the linear regression line passes through the origin of iKiordinate.s, 
and this will allow F to be uiilhased in such easea, as exhibited above, where h is 
biased. Following the Markoff metliod [5], [31 it ean easily he shown that there 
is an infinity of linear estimates of 0 which are util)ia.sed under eondition (1) 
It follows that a choice can lie made among tliem so as to dimiiii.sh the standurd 
error. This, however, is possible only when something i.s known about tlie 
variability of the F’s when the value of X is fixed. For the pro.senl we shall 
assume condition (2) concerning this particular point, hut in practi(‘e thi.s will 
generally be quite impossible. This point w ill be con.sid(‘red further in Section 0. 



VOLUME IX TIMBER &TAXDS 


187 


Consider then a sure or non-iandom A’aiiable* X able to assume the particular 
values dTi, X 2 , - ■ , Xs. Assume that thcie is a finite population t, of A, 
numbers Uii , u^n , ■ • , Uuv, corresponding to each value X,, f = 1, 2, • • , s. 
Assume that the mean u, of the population tt, is a linear function of X, , i.e , 
for any i, 

u, = A + BX, , 

with some unknown values of A and B. 

Assume that out of each population tt, there is selected without replacement 
a random sample of n, individuals, with 0 < < X. , and denote by i/.i , 

y-ii, ■ ■ ■ , r/,Tv, the values of the u’s to be drawn 

If the regression of the amount of timber on the length of the stiip is linear, 
then the problem of estimating the total stand is equivalent to that of estimating 

8 = iZ NZA + BX.) = A E .Y, + B E X.X.. 

t = l t-1 i=l 

Since the length of the strip, X, could be measured from any aibitrarily chosen 
origin, no generality Avill be lost by assuming X) - 0, so that 0 — 

t-i 

B 

A E -^1 = AX (say). Weighting the j/,,-oquallv for each fixed t the B L. U. E. 
of 6 may be denoted by 

(4) F = E , 

in which y,. =.'Zy.Jn, . Here the X’s must satisfy the conditions of unbiased¬ 
ness, 

( 5 ) E{F) = e, 
and of optimum, 

(6) a], = minimum. 

It may easily be shown that condition (5) will be fulfilled by (4) if the K’s are 
so selected that 

(7) En.X, = X; En.X.X. = 0. 

t-i »-i 


‘ This IS an English translation from an excellent French term "nombre certain” and 
‘‘fonction coitamc” to denote a non-random number and non-random function, invented 
by Fidchet. 



188 


.V A. JIVSI.I. 


Condition (G) may now lio eomsideiTcl. l-'iom tin- gfaioral formula for tlie 
variance of a linear function of sen era! random ^-ariiilelc' and the* fact that i/,j 
is independent of yki, 

a 

ir; = X) + n,(n, — — «, »(//,< -• a. )]} 

1 Bill 

(8) = i: - ,, . (nlxl - n.X?) 

i*«i L - * I _ 

= t *-s1 ~d'- 

i«I iVi — i i«i 

in which Sl stands for the (S.D.)^ of the population irt, i.e , 

i 2 («.. - M. )"■ 

A, ,„i 

In addition to satisfying cciuations (7), the X<’s must be selected so os to mini¬ 
mize (8). 

Using the method of Lagrange, we find 

(9) Xi = ^ (a + 0X,), 

for the case where 0 < a, < A’’, and *4, 0. If ni = X,, then .4,- « 0 and 

a + ffX, = 0. 

Assume first that all n, < iV,, i = 1, 2, • • • , s. Tlien a and d are obtained 
from equations (7) after substituting in them (9), nanudy 

a ^Wi + /3 ^ u>, X,- - X 

( 10 ) ' 

« X) w'.A', + 0 u'.A'J = 0, 


where, for simplicity 

, _ ri, fiV*, ■ 1)71, 

(11) la. = —J = - - - - ; 2, la, = 11 . 

A. (X, - n,).S, —1 

If w, is considered as the iveight of the observations at ,Y = A',, it will be con¬ 
venient to introduce a weighted mean and wciglited a.I), of A'’s as follows: 


X n Xi 


x:w,x\ 




XVith this notation equations (10) can be rewritten and easily give 


^ N{Sl -i- a-') 

Fsr ■■ 



VOLUME IN TIMBER STANDS 


189 


Substituting these values into (4), simple transformations give 
(14) F = N(y- xbo), 

in which y = /W, and 6o represents the unbiased estimate of B and is 

given by 

U = [(1/WO H WtX^y, — x'g]/Sl. 

t 

The next step is to calculate o-^. Substituting (9) in (8) and using (11) and 
(12) gives 

4 = W{ci + /3x)" + 

Using (13) gives finally 



If X is the length of a given strip in any chosen units and X the average of 
such X’s for a given block, then (14) and (15) may be written 


(16) 


F = my + boiX - x)] 


(X - x)' 


i 


r 

= — 1 + 
TT L 


SI 


]■ 


Similarly for the case where one of the n.’s equals N,, for example, ni = iVi, 
we find 


(17) 

in which 


Also 

(18) 


F = N[y,. + biX - Ai)], 


b 


X Wi(X, — Xi){y, — yi.) 

t-2 

i: wi.(x. - Xi)‘‘ 


.2 ^ N^X - Xi)^ _ 
t, WiiXi ~ Xi)^ 


It should be emphasized that Xi in the above formulae does not necessarily 
represent the smallest of the X's but the one of them for which 7i< = Ni . 

The case where two or more of the n,'s are respectively equal to the corre¬ 
sponding N,'s need not be considered in detail. Together with the assumption 
of a strict linearity of regression such an assumption, for example, that ni = Ni, 
and Ui = Ni, would lead to the conclusion that the regression of volume on the 
length of strip is accurately known and that the estimation of 6 could be made 



190 


A. V. n\si,i. 


without error. Owing to tlie fact that the liypothe.-’is atKiiit tlie liiifanty of 
regression is, at best, only approximately correct, the errors of estiiuation will 
always be pre.sent and it is iniiierative cither to arrange the sampling .so as to 
have at most one of the n,'s equal to the corresponding .V, , or to liasci the 
statistical treatment of the .sample on a theory different from tlie one con¬ 
sidered here. 


6. Additional hypotheses concerning . The formvilats (Kis with the ii\'n 
determined hy (11) are impossible to apply in practice becau.st; we do not know 
the values of the 5? . Tlie be.st we can do is to make plausible gucaw's as to 
what may be the values of the A'?. Those guessre are bound to be at most 
approximately correct and therefore the estimates of 0 that one can apply in 
practice will be only "approximately best.” It is easy to see, however, that we 
may keep them unbiased, 

Suppose that we denote by rl the presumed value of . Sulxstituting thi.s 
value in place of A? in (8) we .slioidd repeat all tlie calculation.s, leading m to 
such that will a.ssure the unb\ascdnc.sa of, Mty 


= 13 nXy,., 

1-1 

and also a minimum value of, say 

5 i n,(,V, ~ n,) ij 

ve ~ 2^ r, X, . 

1-1 iV, — 1 

The values of the Xl will be obtained from the Mune formulae ua thohc of X,, 
except that instead of 5! they will depend on r’. t'onsciiuently />', will have 
the same form as F, 

(19) F, = .V(y'd- !,;(.? - x')J, 


with the difference that x', y', S ',, and will now liave to be calculat«l with 
different weights, say 


e. 


_ % - 

W, - 


l)n.- 

riijrV 



If the form of the unbiased e.stimate is a.s that of F, the square of its standard 
error is more complicated. In order to calculate it wo have to go back to (8) 
and substitute into it the new value.s of X( obtained from the. guessed weiglits r , 


with 

(20) 


x; = 


Ni - 1 
IK - n,)r? 


(«' + fi'X,), 


a' — ygli{S'x ~f S.'^) 


/9' = - 


Nx' 

VS'^' 



XOLUME IN TIMBER STANDS 


191 


we have 


( 21 ) 


2 C* ^(^* — '2 

• ~n:- r^' 

= E 

1-1 


where p, = Si/r* . It will now be helpful to introduce notation for another 
kind of weighted mean and weighted (S.D.) of the X’s, with weights equal to 
y,p,. So let us write 

^ ) VtPtXi 1^1 pi 

(22) = -1=-; S':^ = - - X 


Expanding (21) and using (20), we have 


(23) 


2 

^0 


i ‘ 



x'{x' - X'OT 


+ 


si* r 


Formula (23) refers to the case where the X’s are measured from their popula- 
tion, mean, X. In order to reduce it to the case where the X’s aie given in 
their original values we have to substitute (x' — X) for x' and (x" — X) for x". 
Thus 


(24) 


E tJ.Pi 



(X-' - X){x' - x'oy ^ 

-s?-J + 


(x' ~ X)* sT\ 

S'J S?} 


Applying a similar procedure to the case where ni = Ni = \, but n, < Ni 
for f = 2, 3, • • ■ , s, we easily find 


(25) 


and 


Fo = N 


y, - (Xi - X) 


Eti.(X, - Xi)(2/. - 2/1.) 

E ^'.(x. - Xi)* 


(26) 


E viPilx. - Xxf 


a] = X\X, - XY^ 


[Ei..(x. -; 


x,Y 


1 * 


This formula will help us to test the appropriateness of guesses about the 
values of S\. It will be noticed that the X’s contain or r? in the same powers 
in the numerator and in the denominator. It follows that all we need to guess 



TABLE II 
Values of S\ 


192 


A. A. HASKI^ 



Weighted ' 10! C62^ SS) 1.370 Sil 1.037' "05 4.33!) 





VOLUME IN TIMBER STANDS 


193 


is a system of numbers proportional to Sl and not the St themselves. Our 
problem will be to test a few such guesses on the data of the Blacks Mountain 
Experimental Forest and see which of them gives generally a smaller value of a-l 

Table II gives values of the iS?, calculated for 15 blocks, together with the 
corresponding Xt. In a few cases iV, = 1 and consequently 5, = 0. These 
cases are not included in the table. Using the values of from Table It and 
assuming systems of the n,’s, the values of were calculated for these blocks. 
These would be the true (S.E )’* of the best linear estimates of the total timber 
volume in each block, but it would never be possible to calculate them from 
sample data. 

The <r*’s were calculated using the following guesses concerning the Sl : 
(1) That they do not depend on X,, (2) that they are proportional to X<, 
and (3) that they are proportional to -s/X^ • The ratios for all blocks 

taken together were found to be .770 for guess (1), .769 for guess (2), and .777 
for guess (3) It is seen that, on the average, the guess that the Sl are propor¬ 
tional to X, gives the smallest average value of al . It is interesting, however, 
to note that the differences between the three guesses are, for all practical 
purposes, negligible 

Ratios like crl/crl are sometimes described as the “amount of information” in 
Fg as compared with that in the best linear unbiased estimate F. This ex¬ 
pression was introduced by R, A. Fisher. In certain cases it has the following 
property which justifies the term used: Let n be the size of the sample which 
serves for calculating Fg , then, if it were possible to calculate the best linear 
unbiased estimate F, the same accuracy of estimation would be obtained by 
using a smaller sample size . In the case considered in the present paper 

the above circumstance does not occur. Still, the ratio a\lo', seems to be con¬ 
venient to describe the situation. 

7. Another scheme for estimating 9. It will be noticed that the ignorance 
of what are the S\ is not the only circumstance which makes it difficult to apply 
the above formulae There is also another one connected with the values of JV,' 
We have X. = 1 in several blocks and for several different strip lengths. True 
this might have been avoided by defining block boundaries m such a way that 
Xi > 2, but it was considered best to conform strictly to the practical situation 
where the A^.’s may be smaller. In such cases we may include in our sample 
all the strips of a given length, say Xi . If we apply to such samples the above 
formulae, deduced under the explicit assumption that the regression of F on X 
is strictly linear, we shall force the fitted regression line through the point 
(Xn, Fi.). As the assumption of strict linearity is obviously not exact and the 
exhaustion of strips of length Xi is possible only when there are very few such 
strips, the whole procedure may lead to serious inaccuracies in the final estimate. 
One safeguard against this is never to exhaust strips of any given length when 
dealing with formulae deduced from finite populations. 

The fact that the true regression point (Xi, Yi.) does not actually lie on a 



194 


A. A. ilASEl. 


Straight line makes it uncertain whether taking into arcount the finitenesH of 
populations of strips of the same length is lieneRcial to the accuracy of the 
finite estimate. In tlie preceding sections ivc worked on the a-tsumrition that 
there is hut a finite number of strips of the same length and on an inaccurate 
assumption that the regression is strictly linear. In the pre.^ent section the fir.st 
assumption will be dropped, having in mind that the effect of the inaccuracy of 
the second assumption may thereby be reduced. 

The assumption that each of the N, is infinite will he made only in dcHiucing 
the X, and will be reflected in weights. Formula (11) will now reduce to w, 
n,/S? . If we assume further that S\ = XH/k, where y and k arc .some con¬ 
stants, then 


= 


kni 

vtI 


W = ~ k 



and the final estimate i.s 

(27) F = .V[y + UX ~ X)]. 


The square of the standard error of F ha-s again tlie .same form iw in (Ui), 


(28) 


9 





(X - xr 

cia > 

Wx - 


the only differences being in W, x, and Si . If 7 = 0 , so that the iSi arc. luwumcd 
to be constant, then 

ID, = fca, ; W ~ k'Z^n,, 

t 

and all the symbols .g, and S* a,ssume their cu.stomary meaning of ordinary 
means and of ordinary (S.D.)° 

It would be easy to deduce explicit formulae for y = 1/2, etc., but they are 
not elegant and, if the necessity arises, the calculations could he carried through 
by starting with = l/i^T • The omission of k does not influence the form of F. 

The question whether the combination of one true hypothe.si.s about the Ny 
being finite, with another incorrect one that the regression i.s strictly linear, is 
better than that of two incorrect hypothese.s, will be studied Viy means of a 
sampling experiment in Section 9. 


8 . Unbiased estimates of o-^. While it may not be unreasonable to hope 
that a guess of a system of numbers proportional to the A’! may he Hiiccessful, 
it is entirely hopeless to try to guess the actual value.s of the ,S'® . It follow.^ 
that, if it is desired to obtain from the samific some sort of mou-Hure of the 
accuracy of F, wo have to calculate an estimate of ay . 

We shall treat the problem by assuming that the regression of Y on X is 
strictly linear and that the are proportional to XJ and that the W, are all 
finite It will be noticed that they will enter the formulae by means of the 



VOLUME IN TIMBER STANDS 


195 


ratios (A^. — 1)/(A^, — n,)- If it is desired to obtain formulae referring to the 
assumption of infinite N^’s, it will be sufficient to replace these ratios by unity. 
Of course, the symbol N will always represent the total number of strips in the 
actual block on which it is desired to estimate the volume of timber and will 
not be affected by the assumption of the N^’s being infinite 
On these assumptions 

E(y, )=A+BX,, 

= ^(y./ -A - BX:f = S\^ kx:, 

with some value of y supposed to be accurately guessed, which however need 
not be specified, and with an unknown factor of proportionality, k. The square 
of the standard error of j/, is then knoivn to be 


(29) 


2 


SlN, - n. _ , XUN, - ru) 
n. iV. - 1 nXN, - 1)~ • 


The right-hand member of (29) is equal to the reciprocal of what we have 
formerly denoted by lu, and described as the weight of the observations at 
X = Xt . We have mentioned above that the formula giving F does not 
depend on the values of the lo,, but on proportions between the Wi. In other 
words, if wc drop the unknown factor k and denote by w, the ratio 


(30) 


- 1 ) 

X:iN, - n.) 


w,, 


which involves only known quantities, these new weights will lead to exactly 
the same value of F as the original weights. It will now be convenient to alter 
the definition of weight and use formula (30). With this new meaning of ui,-, 
(29) could be rewritten (tJ,, = fc/w,. 

Let us further use the letter m to denote the number of those X.'s for which 
we have at least one observation. In other words m will be the number of 
different lengths of strips in the sample and also the number of different y,.’a 
that we are going to calculate from it. 

Now let us go back to formula (16) giving the square of the standard error 
of F. We notice that, while x and 'Sl do not depend on the unknown factor of 
proportionality, k, the sum W of the original weights does depend on it and with 
our new meaning of wj, 


It follows that (T^ should now be written in the form 


2 

<Tf = 


\ 


( 31 ) 



19G 


A. A. HASKL 


and that, in order to estimate vr it is sufficient to get an estimate of k. We 
easily get an unbiased estimate of k by merely applying the second part of the 
Markoff Theoicm [3]. According to it an unbiased estimate of k, based on 
VI — 2 degrees of freedom is given by the ratio 


(32) 


^ b. -y~ boiXi - £)f ... 

^ - W, 


in which y, and bo are calculated according to the assumptions mack* regarding 
Ni and y. It may be expected, however, that the estimate (32) will not be a 
very accurate one because the number of degrees of freedom on which it is based 
may be very small. 

In an attempt to find a better estimate of k we .shall proceed by analogy and 
calculate the expectation of a sum similar to the one in the numerator of (32) 
but depending explicitly on the particular y,-/s, namely of 

-S2 = £ E [v„ -y- UiXi - i-)]*. 

i-j ]-i «■( 

It will be noticed that if the Ni are finite, yi, and yu arc dependent and that the 
Theorem of Markoff does not apply to iS?. Introduce the notation 


111,' ^ y,, A. HX {, 

" i 

X — 

n 


= i-Ev.i = Vi- - A- BXi, 
n, ,-v 


Easy, but somewhat long calculations show that (Sj can be rewritten in the form 

( " \* 1 r ^ " 1 * 

E ) + bi E ii>t(Xi — £)ni- 

1-1 / O* i-l 

✓ Tl.rf ' I — - . HI i nt ii . I...I . . . I, I - ,1^,1 


<50 = E E -- ’i.y 

1-1 ,_i Wj 


W{ 


which is most convenient for calculating the expectation sought. We notice 
first that 


E(r,\,) = kX:, 

Einld = alt. “ ^ . 

Wi 

Further, as yi. and yj. are mutually independent if t ?£ j, the same is true for 
T/(. and Tiy.. It follows that 


EiVi.vj.) = 0 , 

Consequently 

/ ” \5 / "I \ m m 

E (^E = E (^E = E H'? Einl) = k E «v. 


t 9^ y. 



volume in timber stands 


197 


Similarly and for the same reason 

E [E - xW-V = kSlT^w,. 


It follows that 

and' that the ratio 

(33) 


E(Sl) = - - - 2 ], 

l_t-i jv, — rii J 




TJ:iy.-y~ hoix, - x)f 

_ «-i 1-1 _ TU 


TlijNi — 1) _ 2 


n,iN, - 1) 


- 2 


^rl Ni- n, 1=i N.- n.- 

is an unbiased estimate of k. In cases where either all n* = 1 or all Nt are 
infinite the denominator of (33) reduces to the number of degrees of freedom 
m iSo, equal to Sn, — 2. In other cases the denominator of (33) is greater 
than the number of degrees of freedom. Whether the numerical difference is 
large or small depends on the fractions (JV, — 1)/(A^, — n<). We may expect 
that in many practical cases it will be small. 

We shall write 


1-1 ,_i / t-i 

m / m 

E ‘^,{Xi — x)y,. / E Wv 

„ <-l / •-! 


It follows that 


S.Sy 


Si = E u»*Sj(l - r*). 


Substituting this formula into (33) and then the result of this substitution in 
place of k in (31), we finally get 


(34) 


* AT* 

= N 


fijd - r*) 


E ^’--^ - ^ -2 


r. , (X - s)n 
L ^ si J' 


The case where one of the n* is equal to the corresponding Nt, e.g., where 
ni = iVi = 1 is treated in a similar manner. Using formula (18) and the nota¬ 
tion adopted above, we can write 

N\X^ - 2Y 


ffjr 


= k 


E wxx. - X,)’ 



198 


A. A. ItASht. 


The unbiased estimate of a-® will differ from this expreiwion in that instead 
of the unknown factor k it will contain its imltiasecl CKtimati'. 'I'o find this 
estimate we jiroceed exactly as above and ealeuluto tht' exiieetation rtf 

■si = E E (y., - </o - hix. ~ x,)f , 

i-i n, 

with 


6 o = 


E — ih.) 

E - Ad* 


(35) 


The unbiased estimate of a]r is 

a ^ .si _A'*(.Y, - 

± _ 1 ± ~ AO*' 

,-,j iV( — n, ,-i 

The number of degrees of freedom, /, on which mc i« bascsrl is 

/-E«r-1. 


9. Empirical tests of the preceding theory. ApplicatioriH of any mat hematical 
theory involve certain osauniptions about the phenomena Htudiwl that are not 
exactly true. In order to have a reasonable hope that the prefiictions of the 
theory will be comparable to the actual facts, we must perform empirical teats 
and see whether such deviation-s from the aasumptiona underlying the thwry as 
are usually met in practice influence materially or not the working of a given 
theory. Our object in the present section %vill be to test whether and to what 
extent such deviations influence the, applicability of the theory'. For that pur» 
pose it will be useful to enumerate the more important uses of the theory that 
are likely to be made. 

The first point refers to the choice of the standard error tr^ of the best linear 
e.stimate F as the measure of accuracy with which F estiniatea the unknow'ii 
volume of timber, S. If all the assumptions were true, the Theorem of Lia- 
pounoff would guarantee that, when the aixe of the sample, Sn,, is only mod- 
erately large, the frequency distribution of the ratio 

(36) (F - , 

would be very approximately normal about zero with unit H.K, If this were 
actually true then the value of o-j, would be a justifiable hiisis for the dunce 
between various alternative c.stimates of 9, However, the discrepancies between 
the hypothesis underlying the theory and the actual facts may easily produce a 
bias in F, or may deprive o-f of the above important property. 



VOLUME IN TIMBER STANDS 


199 


Therefore, the first thing that we have to test is whether in such conditions 
as are actually met in practice the ratio (36) is in fact distributed in repeated 
sampling in a way that is comparable with the normal law. The data of the 
100 per cent survey of the Blacks Mountain Experimental Forest will serve us 
for the test. 

The second important application of the theory is connected with the use 
of ixp . The purpose of calculating is to characterize the accuracy of the value 
of F obtained from the sample. The most appropriate way of doing so is to 
calculate the confidence interval for 6. This has the form [5] 

(37) F — tal^r ^ 6 ^ F tal^F 

in which ta denotes the “Studenf’-Fisher t taken in accordance with the number 
of degrees of freedom in /i,? and the chosen value of P. The confidence interval 
has the property that, if calculated for a great number of samples, the frequency 
with which the true value of 6 will lie between the limits F ± taixp will approach 
the value a = 1 — P defined as the confidence coefficient. 

The above statement concerning the confidence coefficient is strictly true if, 
apart from the various hypotheses that were enumerated, the distribution of the 
y’a IS normal. As a result of a theorem by Kozakiewicz [6] the same statement 
will be approximately true also for non-normally distributed 2 /,,'s, on condition 
that the sample sizes are considerable. In the situation where the above theory 
18 to be applied all these assumptions are not satisfied. Still the formula for the 
confidence interval may well work, but before accepting this we have to have 
some experimental evidence. The crucial point that it must cover is whether 
the ratio, say 

(38) i = (F - 9)/,.,, 

does or does not follow in repeated sampling a distribution which is sufficiently 
close to the theoretical one, known as “Student’s” distribution. If the empirical 
distrihution of i does approach "Student’s” law, then the frequency of correct 
statements concerning 8 in the form (37) will be approximately equal to the 
chosen a, and conversely. 

The n.’s for this experiment were fixed according to the systems shown in 
Table III, with all X’a having a chance of appearing in the samples, and the 
n/s quite closely proportional to the N,’s and approximately 25 per cent of the 
latter. Random sampling numbers [7] were used in making the selections of n, 
strips out of any group of strips. A total of 150 block samples were drawn, 
equally distributed among the 15 blocks. 

There are 95 samples for the case where all rii < Ni. For these, formula (19) 
was used to calculate F and formula (24) for a], , using the guess that the S* 
are constant over all strip lengths. On the hypothesis that the ratio (36) is 
normally distributed about zero with unit S.E., we divide the range of variation 
of possible values of (36) into 20 intervals such that, if the hypothesis is true, 
then the probability of an observed value falling in any particular interval is 



200 


A. A. HAfiEt 


tablk hi 


Sy«£em« of Hi's for mmpltnff frpcnmrnf 



equal to .06, For 96 samples then, the expected frequency in each interval is 
4.76. The observed frequencies are sho^vn m Table R', 




VOLUME IN TIMBER STANDS 


201 


The agreement between the observed and the hypothetical distribution is 
tested by means of the fourth order smooth test for goodness of fit [8]. The test 
IS designed so as to be particularly sensitive to such deviations from the hypo¬ 
thetical distribution that could be described as "smooth ” It is used here 
because it is expected that, if the actual distribution of the ratios considered 

TABI.E IV 


Frequency dislrtbulion* of {F — 9)/<rf and (F — calculated under various assumptions 


Assumption of Gnite population of strips 

Assumption of infinite population of strips 

(F — fl)/crjp' 1 

<F — 0)//ijp 

(F — d)/ttp 

All n, < Ni 

One ■* ATi 

All n, < N, 

One «, a* Nt 

All B, < N, 

One or more 

Total 

nk 

»k 

«* 

nk 

nit 

fife 

nfc 

5 

11 

4 

9 

3 

4 

7 

3 

1 

5 

1 

4 

2 

6 

5 

2 


2 

7 

2 

9 

8 

1 

4 

0 

2 

1 

3 

3 

2 

4 

2 

5 

4 

9 

4 


5 

3 

4 

3 

7 

8 


6 

2 

6 

0 

6 

3 


4 

2 

5 

4 

9 

3 


5 

2 

8 

2 

10 

7 



1 

5 

3 

8 

1 

0 

3 

0 

4 


4 

4 

1 

3 


4 

4 

8 

6 

1 

5 

2 

7 

5 

12 

5 

1 

0 

3 

3 

7 

10 

5 

1 

0 

1 

7 

1 

8 

2 

1 


3 

9 

5 

11 

5 

2 



6 

1 

7 

4 

0 



4 


4 

10 

1 


2 

2 

6 

8 

4 

6 

1 

2 



1 

Total 95 

44 

95 

44 


55 

150 

1 326 

33,812 

6 463 

13.091 




P .85 

<,01 

25 

01 




B(xh .57 

< .01 

.63 

.09 





does differ markedly from the normal or from "Student's” one, then still the 
curve representing this actual distriliution would be a “smooth" one, presumably 
with a single mode, and would cross the hypothetical curves at only a few 
points. There is empirical evidence to show [9] that in such cases the smooth 
test of fourth order is more powerful than the usual test. 

■* By 20 intervals of equal probability. 

























202 


A. A. HA8E1, 


The criterion used in the smooth test of the fourth order is denoted by vtj. 
If the hypothesis tested is true, then V'* is distributed, approximately, as 
■with 4 degrees of freedom. To calculate we proceed as follows: 

Let a: be a random variable and H denote the hypothesis that the distribution 
of X is given by a perfectly specified function /(i). The range of variation of x 
is divided into 2 s == 20 intervals, 

(- CO, ad, (oi, a,), > • • , (tti,, -f «►), 


so that, if E is true then the probability of x falling within any such interval i.s 
exactly equal to .05. Such a subdivision can frequently be made easily from 
appropriate tables for/(i). We associate with these intervals a variate z whose 
value corresponding to the A:th interv'al will be 


2fc- ] _ 1 ^ 2(fc - 8) - 1 
4a 2 4s 


fc = 1, 2, 


2 . 1 . 


It will be seen that if we start at the point a, and follow up the intervalR to the 
right and to the left, then the corresponding values of z will be 


z 





1 


Consideration of the variable z is then substituted for that of the observed 
values Xi, xi, • • • , i„ of z. If any value Xm falls in the fcth interval a*,v < 
Xm < flfc, then this is interpreted as an observation of x winch yielded the 
value z*. Let n* denote the number of observed x's whioli fall in the interval 

it 

{ak-i, a*) and let the Gaussian symbol [z'] stand for the sum (z’] « 2^ . 

ii~i 

To apply the fourth order smooth test such sums have to be calculatei.1 for 
i = 1 , 2, 3, 4. Then they are substituted into the equations below, deductxl 
under the assumption that the number of interval.s of subdivision of the range 
of X is equal to 2s = 20. 

ul = n"'(3.468,440[^])^ 

ul = n"^(13.500,884[z'] - 1.122,261tt)\ 

ul = n“*(53.857,548[z’l - 8.031,507[zl)’, 

u\ = n“*(218.148,007[z*] - 46.239,587[z"] + 1.139.5()0n)*. 

Finally = Uj + u? ■+• Uj -f- uj. If the calculated value of exceeds the 
tabled value of y® with four degrees of freedom, corresponding U) the chosen 
level of significance, then the hypothesis tested, //, should be rejected,' 

* The above expiessiona for the w'a differ n little from those published in tin*, original 
paper on the smooth test because in the latter the tost was designed to apply only to un¬ 
grouped observations. The present formulae obtained in the Statistical Laboratory of the 
University of California appear in print for the first time. Obviously if the numter of 
intervals 2s is inereased, the formulae for grouped data will approach those for un¬ 
grouped ones. 



VOLUME IN TIMBER STANDS 


203 


The agreement between the observed distribution and the expected distribu¬ 
tion IS shown to be excellent in Table IV, the probability of a greater difference 
occurring through errors of random sampling alone being .85. The correspond¬ 
ing P for the X test, where consecutive pairs of intervals are combined to make 
10 intervals in all, is .57. 

For the case where one n, = V, = 1 there are 44 samples. For these samples 
the value of F was calculated from formula (25), and the value of from 
formula (26), again taking the values of the as being constant over strip 
length within blocks. In this case the deviation from expectation shown in 
Table IV is greater than can be attributed to chance alone. These results are 
also obtained by the x test, which gives x = 25.091 and P < .01 on 9 degrees 
of freedom. 

The conclusions we draw fiom these results where one of the assumptions 
made is that the population of strips is finite, are that the block boundaries 
should be so defined that all V, > 1, or if this is not done, that the systems of 
11 ,’s be such that no sampling is done from strips where V, = 1. The fact that 
some n, = 0 when the corresponding V, > 1 has no appreciable effect on the 
woiking of the theory In the previously described test for samples in which 
n, < N, the N used in formulae (19) and (24) always referred to all stnps in the 
block, regardless of the fact that stnps of some specified lengths Z, did not 
appear in particular samples. 

Using the same samples, the distribution of (P — d)/ny is compared in a 
manner parallel to that described above, to the distribution of the "Student”- 
Fisher i, takmg into account the number of degrees of freedom. 

The formulae used for estimating <rr , namely for calculating /j.], , are (34) 
where all n, < iV,, and (35) where one u, = Ni. The estimates of 6, namely F, 
remain unchanged from those previously calculated. 

The results from this second application of the theory as judged by the smooth 
test in Table IV lead to the same conclusions as were made from the first applica¬ 
tion of the theory, namely, that under the assumption that the population of 
strips is finite no V. should be exhausted in the sampling. 

It is interesting to note that the application of the x” test to the observed 
distribution of (F — d)/tir corresponding to samples with one = Ni = 1, 
did not reject the hypothesis that it follows “Student’s” law. In this case the 
range of t was divided into 10 intervals of equal probability and the value of x” 
obtained was 15 091. With 9 degrees of freedom this gives P of the order of .09. 

The ratio (36) cannot be determined under the assumption that the population 
of strips is infinite where one re, = N, because the values of 5?/^^ cannot be 
obtained for such strips Under this assumption it is impossible to calculate 
iTp by the formulae deduced in the present study and the first use of the theory 
must be omitted. However, the estimate of o-y from samples can be calculated 
and the ratio (38) determined. 

The estimates F were calculated using formula (27), taking n, = w, . This 
same formula applies whether or not one or more of the JV, are exhausted. Each 



204 


A. A. HASEI. 


sample from Block 15 and one sample from Block 12 exhausted two or more 
strip lengths and their estimates could not I'm calciilatcsl under the assumptions 
made heretofore, but these* can now obtained under the prestmt atwumptions. 
The estimates ar were obtained from (34) for all samples, taking the .S*’ as 
constant over all strip lengths and n, ~ to,. The fact that one or more jVj 
are exhausted does not change the procerlure for such aamples in any way. 

For the case where all n, < in Table IV, the value of P « .06 obtained by 
the test indicates that the agreement of the ohserve<i distribution with 
expectation, although not close, is acceptable. When the data are regrouped 
into 10 classes and the x test is applied, we get P » .18 on 9 degrtms of freedom. 

The ^5 test applied to the distribution of (P — Q)/nr for samples where one 
or more n, = N,. indicates that the correspondence with expectation in good. 
This result is in marked contrast to the corresponding results in previous tables 
and bears out the belief previously expressed in Section 7, based on intuitive 
considerations, that by dropping the assumption of finiteness of number of 
strips of a given length, the error of the assumption of strict linearity of regression 
would be compensated for to some extant. On the basis of these findings we 
can add the conclusion that if, in sampling, the number of strips of a given 
length are exhausted, the assumption of finiteness should be dropped and the 
sample estimates calculated from formulae deduced under the assumption that 
all Ni are infinite. 

There remains some question as to statistical treatment of wimples in which 
all Ui < N{, that is, whether to use formulae deduced for finite or infinite popula¬ 
tions. The final choice can best be based on the relative site of the confidence 
interval (37). Where all n< = 1 the estimates are the same under both as¬ 
sumptions. For estimates of all blocks taken together the finite population 
estimates tended to be within 5.6 percent of $ in 95 out of 100 trials, while the 
corresponding percentage for infinite population estimates was 6.0. We there¬ 
fore conclude that it is better to use the assumption of finitencas of Ni where all 

rii <Ny. 

The method of sampling considered here is what could be called restricted 
random. The restriction consists in that we group together the sampling 
units of the same size, select nonrandoml5'’ several such groups, and only then 
proceed to draw at random n,- units of a group of Ni. Frequently the strips 
of the same size will be situated wthin the block close to one another. In those 
cases the restricted sampling considered will assure that the sample will contain 
elements more or less uniformly distributed over the area of the block. 

10. Summary. Several methods of sampling timber stands and statistical 
treatment of the samples were considered. Data from a complete inventory of 
the Blacks Mountain Experimental Forest served for testing the methods in 
practice. 

It was found that the usual method of estimating from strip samples taken 
within nonrectangular blocks of timber gave biased estimates, unless the linear 



VOLUME IN TIMBER STANDS 


205 


regression of volume on strip length passed through the origin of coordinates. 
It was shown that this condition was not a safe one to assume. Consequently 
methods of estimation were sought which were freed from this restriction. 

The appropriate formulae for the best linear unbiased estimates were deduced 
under various combinations of the following assumptions. 

(1) That the regression of timber volume on strip length is strictly linear, but 
may or may not pass through the origin of coordinates. 

(2) That the values of the (S.D.)'* of timber volumes on strips of equal lengths 
are (a) constant for different strip lengths, (b) proportional to strip 
length, and (c) proportional to the square root of strip length. 

(3) That the number of strips of a given length in each block is (a) finite, 
and (b) infinite. Assumption (b) was based on intuitive considerations 
which indicated that this assumption, though known to be false, might 
compensate for another false assumption, namely, that of strict linearity 
of regression. 

It was empirically found that assumption (b) of (2) gave better results than 
either (a) or (c). However, the advantage was small and, in the author’s 
opinion, did not justify the extra labor in calculations which are simpler when 
assumption (a) is made Therefore all other calculations were made on that 
assumption. 

An extensive sampling experiment was made to test whether the smallness 
of the samples combined with the conflicts between assumptions of the theory 
and the actual facts, influenced the validity of the normal theory.. - 

Whenever the sample did not exhaust strips of a given length, it was found 
that the formulae based on the assumptions that the populations of such strips 
are finite and that they are infinite both work satisfactorily, generating distribu¬ 
tions similar to those determined by the normal theory. However, the confidence 
intervals based on the true assumption that the populations of strips of equal 
length are finite, proved to be narrower. Consequently, whenever the sample 
does not exhaust all strips of any given length in the block, the true hypothesis 
concerning the number of such strips should be used. Formulae (19) and (34) 
are therefore the appropriate ones, using weights based on finite populations. 

In cases where the sample did exhaust the strips of a given length, the treat¬ 
ment of the number of such strip.s as finite, combined with the inaccuracy of the 
assumption that the regression of timber volume on length of strip is linear, 
resulted in marked disagreement between the actual distributions of statistics 
and those based on normal theory. This disagreement was not found to exist 
in statistics calculated with formulae (27) and (34) used on the assumption 
of an infinity of strips of a given length. This suggests the conclusion that the 
exhaustion of strips of a given length by the sample should be avoided and, 
when this is impossible, then the formulae based on the assumption of an infinity 
of strips of a given length should be used. 

The formulae deduced can be applied equally well to line plots as to strips. 
With the formulae deduced the most efficient sampling will be obtained when 








TABULATION OF THE PROBABILITIES FOR THE RATIO OF THE 
MEAN SQUARE SUCCESSIVE DIFFERENCE 
TO THE VARIANCE 

By B. I. Hart 

Ballistic Research Laboratory, Aberdeen Proving Ground 
with a note 

By John von Neumann 


In recent publications von Neumann has determined the distribution of 
the ratio of the mean square successive difference to the variance, for odd 
values of the sample size and for even values of n ^ In this paper the prob¬ 
ability function, i.e., the integral of the distribution, is evaluated for specific 
values of n. 

Let a: be a stochastic variable normally distributed with mean f and the stand¬ 
ard deviation tr. The following customary definitions for the sample are: 


the mean, 
the variance. 


1 ” 

Tif 

- - ^(x„ ~ x)\ 


n 


n-l 

and the mean square successive difference, 5^ =-= (x.+i — Letting 

n — 1 n-i 


2n 


. , (1 — «), von Neumann shows that the distribution of 6, uU), is 

S® 71 — 1 

• • * TT » 1 

symmetrical with zero mean and intercepts equal to ± cos - (loc. cit , p. 372), 

n 

and that w(e) is determined for odd values of n by 

, Ci(»-1! - 1)1 1 

- d=-^- - 


l/S 


in the odd intervals 


TT ^ ^ 27r 

cos - ^ e a cos —, 
71 n 


cos 3ir . ^ iw 

- S « s cos —, 

n n 


cos (ti — 2)7r 


cos (n — l)Tr 


' John von Neumann, "Distribution of the nilio of the mean square successive difTer- 
ence to the vaiiance,” Annals of Math. Slal., Vol. 12 (1041), pp, 367-395. 

’John von Neumann, "A further remark on the distribution of the ratio of the mean 
square successive diiTorcnce to the variance,” Annals of Math. Stal, Vol, 13 (1942), pp 86- 
88 


207 



208 


B. I. IIABT 


by r-j,—a)(e) = 0 in the even intervals 

flenn—D—1 

2ir ^ ^ 3 t 

COS — ^ < S COS —, 
n n 

4r . ^ Sir (n — 3}ir . ^ (n — 2)jr 

n n n n 

(loc. cit.‘ pp. 389-390). 

For n = 3, 

( 1 ) 


M 1 J 
«W = — 

T ^ — £2 


, ’f ^ 2 ir 

for cos - g « a cos - 5 - . 

u 0 

For n = 5, 


1 1 

w'{t) =-============^====r- 

’T V- t* + h^ - * 


( 2 ) 


= - - 

’r T 


27r 


IT , 2 t 

cos - + cos „ 
5 6 


sn 




"1 , TT 

cos - + cos -r“ * + COM “ 
_5_5 6 

IT 27r X 

cos y — COS -r cos~ e 
5 6 f» . 


.11 


T 2 Tr 

COH.r — COM V 

5 _ ^ 5_ 

’ ■*• , 2 x 

cosr + OOS ■- 

6 5 


for COS ^ ^ e a ^ and cos ^ S e g cos — . 
005 5 

Sir ^ 

But for cos — S e i cos ~, «'(«) = 0 , thus 


(3) 


For n ~ 7, 


(4) 


for cos 


uCe) = const. 


c"{e) = ± 

rV-e' + fe^ 


2v 


^ s < ^ cos y and cos — ^ t ^ cos y with the +‘ 8 ign, and for 


Ott 


Stt 4 

cos y ^ ^ cos y with the - sign 


2jr 


Stt 


But for cos y ^ 6 ^ cos y and cos y S « cos y, w"(e) « 0, thus 
_ w'(«) = const. 

funcutru‘ied''fofn » 7T^\ numerical evnlualion of the inverse sine amplitude 
luncuon used for ny 4, 5, 6, is taken from unpuhl shed tables of the Levendrian ellintic 

ihe square of the modulus is the argument for this tabulation. 


Sr 



TABULATION OP PROBABILITIES 


209 


For even values of n von Neumann shows that the distribution of e, 

^ r[[(n - 2)]r(i) /ml 6) ~ 


For ?r = 4, 


£*'a+(o) 


W = i f 


© 


I -1 dp 


( 6 ) 


Vi < P's /1 — p ’ 

whe„ (0.1 [- (l - cos r)(l - 00.^)] 


wx+(0)(«) = f / ■ ^ 

V2 T Jvi . V (p - V2 e)(p + V2 t)(l - p) 


V2 


"■V1 + V2 


. sn 


-Yl 1 - V2t °\ 

\ ’ 1 + \/2 e / 


r ’I Stt 

for cos - S ^ . 

4 4 


,1 wx (-) 

For n = 6 , coa+(o)(«) = i / — dp, where 

«2«/</»A/l — n. 


(7) 


Wa 


©= 


ir(V3 + 1) 


*«/v» •\/1 — p 


TT TT Stt 5 

for cos - S « ^ cos - and cos ^ ^ cos and where 

o o o 6 


( 8 ) 


<■)(«) = const. 


for cos 5^ S S cos -5- . 
o o 


The integrals needed to obtain w(€) forn = 6 and w'(e) for n = 7 have been 
evaluated by numerical quadrature. Graphs of the distribution of a)(5Vs*)> 
for n = 3, 4, 6, 6, 7, are shown in Fig. 1. 

J pk 

I «(«*/«*). d(fiVs*) lias been oh- 
0 

tained from u(i*/s’*) by numerical quadrature for n = 4, 6, 6, 7. The results 
are given in Table III. 

As is mentioned by von Neumann, R. H. Kent has suggested a series ap¬ 
proximation of the form 

« / _ \»n-l+)> 

«(«) = S Oa{ cos* - — e*) , 

A-o \ n / ’ 





210 


U. I. IIAIIT 








tio. I 


since the order of vanishing of ^(e) is — 2, and since w(«) ia an even function 
of 6 (loc. cit * p. 391). Determining the a* by the condition of normalisation 
and by the first three even moments of the actual distribution, Mt , 2 \f^ anti Mi 
(given on pp. 377-378, loo. cit.‘), and integrating the result, wt* obtain 

P(« < /o') = / ,23% (cos* - - «*) dt 

Hi Ml \ n } 


_ (n-l)(a+l){n+3) 
2 * 


Uibi - 2], Mra - 2]) 


— 1 -(- + 5)(?t + 7 ) + 5)(« 4* 7)(n + 9)~ 

cos* ~ 3 cos* - 45 cos® - 

'<> n Ti - 













TABULATION OF PIIOBABILITIES 


211 


(9) 


+ («±i.)(^+ 3)(«^ 


1 - 


MiOn + 13) 

2 IT 
COS - 

n 


+ 


Mi(Zn + ll)(n + 7) M,(n + 3)(n + 7)(7i + 9)‘ 


O 4 ^ 

3 cos ' 


+ + 21, Mn + 2]) 


IP 0 2r 

15 cos - 
n 

-] + + 11 ) 
cos^ - 


-<1/4(321 + 19)(iJ + 3) Mf,i) i + 3)(n -1~ 5)(n -|~ 9 ) 

3 cos'* - 15 cos® - 

71 n 


+ (r^_5KZi_+J7)_(n+_9_) ^ 


‘1 Main + 3) 


COS'^ 


+ 


i¥4(n + 3)(/i + 5) Mr,in + 3)(n + 5)(n + 7)‘ 


3 cos^ - 
n 


45 cos® - 
n 


The Tables of the Incomplete Beia~Funcltoii* can be used to evaluate (9), with 

a: = - I-+ 1Y Table I shows the results obtained for the eighth and 

2 \cos (ir/n) / 

tenth moments for the distribution (9) and for the true distribution for certain 
values of n. 

Table II gives a tabulation of for n = 7 by the use of (9) and Iiy 

the method of (4) and (5). The approximation (9) has been used for the com¬ 
putation of the probabilities of Table III for n, ^ 8 

It has been shown (loc. cit/ pp. 378-379) that for n —> <» the distribution of 
6 becomes asymptotically normal. For n = 60 values of 6"/s^ arc given below 
for different levels of significance The.sc values have been computed from 
Table III and from a tabic of the integral of the normal function with standard 

deviation equal to ■ A/ >-> the square root of the second 

n — 1 y (n — l)(n + 1) 

moment of the distribution of S^/s^, 


1 Kali Pearson (Editor), Tobies of the Incomplclc Bcla-Fxmchon, London. Biomotrika 
Office, 1034 

‘ The lesults obtained by L. C, Young using the Pearson Type II distribution are suffi¬ 
ciently piocisc for the significance levels and sample .sizes tabulated Cf. L C. Young, 
"On randomness in ordered sequences,” Annals of Math. Slat , Vol 12 (lOdl), pp. 293-300. 




212 


II. 1 H.\,aT 


T.VHIJ-: I 



i/« 

J/. 




(9) 

Truf 

w 

True: 

7 

.0(M12 

fxitll 

,00201 

(xr2oa 

H 

00.1 IS 1 

OOllS 

mii.v) 

.Ofll.'il 

0 

,00240 1 

1 

.00240 i 

J 

ooin 

.(K1U2 


TAIUJI 11 
P I'X" n “ 7 


k 

By (9) ; 

By (1) »ntl m 

.25 

.00001 i 

■ tXXXIl 

.30 

00007 1 

00007 

,35 

.00027 1 

tKX127 

40 

0006,5 

mm 

.45 

.00124 ! 

(Xll'ill 

,50 

.00209 ( 

WWM 

,55 

(X)326 

.003,‘W 

60 

0017H 1 

(XMWl 

65 

00071 ; 

1XX17S 

70 

00011 1 

tHKlI3 

,75 

01203 i 

1 01197 

.80 

01552 

01534 

85 

01064 

01032 

90 

02443 

,02403 

.06 

.02996 

.02957 

1 00 

.03624 

.0359.S 

1 05 

.04333 

.CM326 

1,10 

.06126 

.06137 

1,16 

.06000 

.000.30 

1 20 

.06976 

.07020 

Values of fiVs* for Different Levels of Significance 


n = 60 



P - .001 P -= .005 

P “■ .01 P “« .06 

Table HI. 

. 1.2668 1.3779 

1.4384 1.0082 

Normal. 

. 1.2368 1.3688 

1.4333 1.0092 


This work wag undertaken, at the auggestion of Mr. E, H. Kent. I am 
much indebted to him and to Professor John von Neumann for many important 
suggestions and criticisms. 


Note to Fig. 1, by John von Neumann. Inspection of the graphs of w{sVs*) 
for n = 3, 4, 5, 6, 7 (see Fig. 1) discloses certain singularities of the function 
w(5Vs 7, which seem to deserve attention. 




TABLE III 




71 

4 

fi 

n 

7 

s 

0 

10 

u 

\2 

25 




00001 

00001 

,00001 

OOOOl 



30 




00007 

00007 

.(K)005 

00004 

00002 

OOOOl 

35 



00006 

00027 

00021 

.00014 

00009 

.00005 

0000,1 

40 



00047 

00065 

.00047 

00031 

00019 

00012 

(X1007 

45 



00126 

00126 

.00088 

00059 

00038 

,00025 

00016 

,50 


00038 

00246 

00214 

00150 

00103 

00069 

00046 

00031 

. 55 


00223 

.00409 

00333 

.00237 

00168 

.00116 

.00080 

00055 

80 


00493 

00615 

(X)186 

003S5 

00259 

.00185 

00132 

00094 

.65 


00830 

00865 

00678 

00511 

00382 

00282 

00208 

00152 

70 


01226 

01161 

00913 

00710 

00544 

,00414 

00313 

00235 

75 


01673 

01505 

01197 

00958 

00753 

.00587 

00455 

00351 

80 

00356 

02171 

01900 

01534 

01263 

01015 

00809 

00642 

00508 

85 

01302 

02717 

02348 

01932 

01631 

01338 

01089 

00883 

(X)714 

90 

.02257 

,03310 

02851 

02403 

.02068 

01729 

01436 

OllSS 

,00980 

95 

03223 

03949 

03412 

.02957 

02579 

02196 

.01858 

01565 

01310 

1.00 

04199 

.04634 

.04035 

03598 

.03171 

.02745 

02363 

.020^5 

01733 

1 05 

.05186 

05364 

0472S 

04325 

03819 

,03384 

02959 

02578 

.02241 

1 10 

06184 

.06140 

05500 

05137 

04618 

.04120 

03655 

03232 

02852 

1 15 

07194 

.06963 

.00301 

06036 

.05482 

04957 

04458 

03997 

03577 

1.20 



07323 

07020 

0C445 

05901 

05375 

04SS2 

04425 

1 25 






.06056 

06412 

05894 

05407 

1 30 








07040 

06531 


/ 

/ 

16 

20 

25 

SO 

40 

50 

00 

35 

40 

46 

.60 

.65 

00 

05 

70 

75 

80 

.85 

OOOOl 

00002 

00004 

.00009 

.00018 

.00033 

00059 

00100 

00161 

00250 

.00375 

,00001 

00002 

00005 

.00012 

.00024 

.00044 

00076 

,00127 

.00001 

00002 

.00005 

00011 

.00023 

00044 

.00001 

00003 

.00007 

00015 

OOOOl 

00002 



90 

.00547 

00206 

00079 

00030 

00004 

.00001 


95 

00778 

.00323 

00135 

00057 

,00010 

00002 


1 00 

.01079 

,00489 

00222 

00102 

.00022 

00005 

OOOOl 

1 05 

01465 

00720 

00355 

.00176 

00044- 

00012 

.0<X)O.J 

1 10 

.01950 

01033 

00550 

00294 

00085 

0002G 

.00008 

1 15 

02550 

01448 

00826 

00474 

00158 

00054 

.'00019 

1 20 

032S0 

01986 

01208 

.00738 

.00280 

00108 

00043 

1 25 

04155 

02670 

01723 

01117 

00476 

00206 

(XX)92 

1 30 

06189 

03524 

02102 

.01644 

00780 

.00376 

00185 

1.35 

06396 

04571 

03276 

02357 

01235 

.00656 

00355 

1.40 

07787 

05834 

04379 

03298 

01892 

,01098 

.(X)049 

1 45 

1 50 

1.55 

1.00 

1.05 

1 70 


.07333 

.05743 

07398 

04511 

06038 

07920 

02810 

.04055 

05690 

07797 

01769 

02750 

.01131 

.06000 

,08405 

.01133 

01893 

.03034 

.04075 

.00912 

.09949 



Values of k for which P j 

f 6^ A 

1 = 0 

n 

k 

71 

k 

4 

7811 

15 

0468 

5 

4775 

20 

.0259 

6 

.3215 

25 

0164 

7 

2311 

30 

0113 

8 

1740 

40 

.0003 

9 

.1357 

50 

0040 

10 

.loss 

60 

0028 

11 

0891 



12 

,0713 




213 




21G 


inVING AV, HUHH 


Furthermore it may be ahown that 

(2) Fix) = r F'(S)dS. F'(x)=f(x), 

J—00 

where/(a:) is the ordinary probabilit 3 ’^ function. Also 

(3) Pin < X < b) ~ [ fix) dx, 

»'a 

Similarly for the discrete case, 

(4) Fix) = Sa m, A Fix) = Ifii), 

(5) Pia<x<b) = Fib + h) - Fia) = 

where a, b are among the values nh 4- d, and A ia the usual /i-difference. In i)oth 

h 

cases the percentiles are given by the solutions of tlic eiiuations 

(6) Fix) = n/100. 

Equations (1), (3) and (6) formulate the advantage, to the direct use of F(x). 
Avhich was mentioned in section 1. Related to thi.s is the fact tliat the procp.^s 
of finding/(a:) from Fix) is at least theoretically much simpler than conversely, 
as (2) and (4) show. The directness of equation (6) is often an advantage also. 

The main problems confronting one in trying to utilize tlie.se advantage.^ are 
(a) to find suitable cumulative functions and (b) to find methods of fitting Fix) 
directly. These are next discussed. 

3. Some special functions Fix). An obvious method of attack is to use (2) 
or (4) on some fix). The integration involved is precisely the difficulty the 
writer wishes to avoid. The cumulative function might be sought directly in 
probability theory. A differential equation incorporating some of the properties 
of Fix) given in section 2 is 

(7) S ~ ^ 

where gix, y) is to be positive for 0 < y < 1 and x in the range over Avhich tlie 
solution is to be used. It is to be noted that (7) is very similar to the differential 
equation 

dv 

— =^yim- x)gix, y), y = fix), 
which generates the Pearson system if gix, y) = (a + bx + cx^)~'' 



CUMULATIVE FREQUENCY FUNCTIONS 


217 


Equation (7) implies the non-decreasing property for F{x), while for many 
choices of g{x, y), dy/dx will be zero at 1 / = 0 and y = 1. When g(x, y) = gix), 

(7) becomes 

(8) E(a:) = -b 1]-'. 

Some functions g{x) whose integrals are such that F{x) increases from 0 to 1 
on the interval — < a: < 00 are c, cx~^, [(c — a:)a:]~\ c sec’^a; and c cosh x, 

where c > 0 Generalizations of their corresponding F(x) are given below in 
(10)-(14) respectively. 

Another method of attack is to simply consider functions which have the 
properties given in section 2. The assumption of high contact provides for the 
existence of certain integrals to be discussed in section 5. Many functions 
having the required properties are to be found in tables of definite integrals, 
particularly Bierens de Haan [ 1 ], 

A list of particular F{x) is given below. In all cases the number of parameters 
would be increased by two by letting x = yx' 5, where y and 5 fix the origin 
and scale. These parameters are determined by x and v The range of z 
over which the given expression is to be used is written to the right when it is 
not ( — 00 ). Constants k, r and c are positive real numbers. 


(9) 

F(x) = a:, (0, 1), 

( 10 ) 

Fix) = (e-* + 1)-', 

( 11 ) 

^(a:) = (*-* +l)-^ ( 0 , co), 



( 12 ) 


(13) 

F(a:) = -b 1 )-', 

(14) 

Fix) = (fce-” + 1)-", 

(15) 

Fix) = 2~''(1 -b tanh x)'. 

(16) 

Fix) = {- arc tan e"" j , 



(17) 

^ A[(l-b e-)' - 1 ] -b 2 ’ 

(18) 

Fix) = (1 - e-*’)^ (0, cc), 

(19) 

E(x) = ^x - ^ sin 2 irx^ , ( 0 , 1 ), 

( 20 ) 

F(x) = 1 - (1-b ( 0 , «.), 


Most of these functions have unimodal probability functions f{x), and all of 
the functions may be readily handled from the calculational standpoint. To 



218 


IIlVlNCr 'W. IU:ilK 


clK-ckupou theii Kuituliililv f(ir puiotiwil work, tli(! vultics of m aiul a, for -otuc 
•special ca.ses wcie obtained approximately by oA'iilnalnig Fin at a eonvenieiil 
regular interval, diffeicneiiig, aiul u.sing the results as fieipieueie^ of a liiM-ri'te 

TAltbK r 

('alcnlnted aj anil m for special fitnrlinns Fir) 


runclion 1 

_____ \ 

?aramelers 


"4 

«, 

(16) i 

j - f 


(1 

4.01 

(16) 

r =» I 


t) 

2.2! 

(17) 

k => 1, r “ 

2 i 

-.62 

•1 .50 

(17) 

I =» 2, r = 

1 1 

0 

4.11 

(17) 

k = 2, r = 

2 ' 

" 1 

- 5f 

4.22 

(18)» 

r = 1 

j 

.62 

.■).25 

(10)= 

r = 1 

1 

1 

f) 

2 41 


variable No correction for grouping \v!i.s madi' 'Dh' values of and 
for sevcial of the above functions are given in Table I, where 

p, = [ x’fixldx, 

J—eO |aMw.90 

(21) M, = r {x- p[y/ix) dx, End - piYfU) 

•I—M W 

-Ml a _ 

«: = . <f - tli- 

cr' 

It Will be seen that a variety of values of ai apiiear. The value.s of oj vary 
considerably in most cases as r varies. The.se fimetions .show iironii.se of being 
useful after further investigation, Tlie values of kj and a\ for (20) ar(‘ con¬ 
venient and adaptable. This function will he di.scussed in detail in section (i. 

4. Methods of fitting F{,x). The problem of graduation of data by a cumula¬ 
tive function involves three steps, (a) the selection of the type of function 
(b) the determination of the parameter-s of the function, and (c) the graduation. 
The fir,st two are often determined by such moment characteuslica as aj and 
ai, as in the Pearson system of frequency functions. The third stei> luvoh’cs 
integi'ation or summation if f(x) i.s u.scd, whereas, once Fix) is fitted, all that 
remains to be done is evaluation of the function and differencing. 

To fit Fix) by moments, it must be po.s.siblc to determine the parameters of 
Fix) from .c, <t, aa and , The cumulative, moments descrilH’d in the next 
section, when they can be evaluated, will lead to the values of the x, <r, aj and 
for various values of the parameter-s. If the relations between the parameters 
and the moments are difficult or impossible to obtain, then tables may be cem- 
striicted and interpolation u.sed The usual proce.ss would bo to u.se the ai 

® The method of momenta of section 5 was used for these values. 





CUMULATIVE FREQUENCY FUNCTIONS 


219 


and ai tables to determine the primary parameters such as c, k and ? in (9)-(20). 
Then for the given values of c, k, r, one computes the corresponding values of 
,r and o- fiom their tables, and these are used to obtain the parameters 7 and 5 
for X = yx' + 5, Tliis procedure is illustrated in seetion 6 . 

Ei’en when the cumulative moments cannot be evaluated, this method is 
still possible. Graduation by a small interval is used to construct tables of 
,r, O’, 03 and 04 for varying valuc-s of the parameters. Then the table can be 
used as described abor-e Thus it is .seen that in practice any E(.t) can be fitted 
by this technique 

The usefulness of a cumulative or a probability function depends upon how 
wide a range of sct.s of values of the o, the function covers, and whether .such 
i-aliies occur in practice In most of the functions (9)-(20), 03 and at are con¬ 
tinuous functions of the paiametcis. If there is onlv one parameter then only 
03 (or 04 ) can be fitted in the range of values of ai which the function possesses, 
but in the case of two parameters both 0:3 and 0:4 can be fitted. Three or more 
parameteis permit as etc. to be fitted 


6 . Cumulative moment theory for Fix). A moment definition for Fix) is now 
presented Since for n > 0 , lim / x”Fix)dv = x, I x^Fix) dx cannot be 

b-*eo J—eo 

used However, it vas a,s,sumed in section 2 that for some k > j + 1, 
[1 — Fix)]x'' IS ultimately bounded Hence, lim [1 — Fix)]x’ = 0. Thus 

X-*00 

1 — Fix) can be u.scd as a factor when integrating over any interval (a, «>), 
a being finite. But the factor Fix) must be used for an interval of the type 
(— 30, b) Two integrals are needed, and we define the cumulative moment, 
-H,(a),by 


( 22 ) 


d/ 4 (a) 


= f ix — a)-*!! — Fix)]dx — f (x — a)’Fix) dx, 

Jn •/—60 


which exists under the assumptions of section 2 The difference of the integrals 
is used because, as will be .shown, this leads to .simpler results than could be 
obtained liy addition If a = 0 in ( 22 ) then calling M,iO) = M ,, 

(23) M, = f x’[l - Fix)]dx - [ x^Fix)dx. 

Jo J-m 


Definitions for the discrete case arc similar 


(24) 

(25) 


jV/,(a) = h E,. a - - h j:k ii - o)''>^E(t), 

^maa+h tea —00 

M, = h - Fil)] - h ik r‘^’*E(f), 


where = 2(1 — h) • ■ (z — j — Ih). This function is used because it has 
simpler properties in the finite calculus than has F. 



220 


IIIVIXC! 1V% DCIIR 


Various relations betiveeii the cumulative moments M and M >, and be¬ 
tween these and n', , and a, of (21) are now fleveloped. To expires,s 3/j(a) 

J 

in terms of Mi's, use (x — a)‘ = ^ sC,x^ Thus, 

1-0 

M,(a) = f (a; - - /''(x)] da: - ( {x - aY Fix) dx 

- J (x — a)Ml - F(x)]dx — J (x ~ a)’Fix) dx ~~ (x — a)^ dx 


(26) 


M,ia) = E ,C,(-a)M/,„. + 




1-0 J + 1 

One reason for the minus sign of (22) may be noted here, because in the contrary 

case the last term would be f (i — «)'[2/'’(-s) — 1] dx. By translating the 

Jo 

origin in (26) to ai = o, renaming the moments, and replacing —a by o, one 
obtains 

(27) Mi « L ,C.a^Maa) + -"I". . 

1-0 J ■r i 

To bring in ordinary moments, integration-by-parts and (2) are used, 

(* - a)'+' 


J -t~ i 


/(*) dx 


M,{a) - 11 - I'C®)) ]" + [ 

(28) - f;’(:r)]“ -f f fix) dx 

=^1 r 

J -b 1 

the first and third quantities vanishing because of the contact assumption. 
A second justification, of the minus sign of (22) appears here, since if a piositive 
sign were used, the fourth term would have been subtracted and the integrals 
would not combine into (28). Expansion of (x -• a)^^‘ in powers of x and 
X ~ Ml yields respectively 


(29) 

(30) 

Also setting o = 0, 

(31) 

(32) 


Mi(a) = /+iC',(-o)Vi+i-<, 

J -t- 1 ,-0 

Mi(a) =* r - r — E i+iCiijii ~ 

J + 1 (-0 


M, 


j + 1 


g/+i 




£ 




H-lCiMl Mj + i - i . 



CUMULATIVE FREQUENCY FUNCTIONS 


221 


It may be shown that the existence of M,{a) implies that of the 
i = 1, • • ■ , J + 1, and conversely if y.', exists then so do the ilf,(o) 
^ = 0i • ■ • IJ — 1 The following formulas are obtained by the opposite inte¬ 
gration by parts, taking two different forms for J f(x) dx: F(x) and — [1 — F(x)], 
to avoid indeterminate situations. 


y', = f x‘f(x) dx -f f x’f{x) dx 

a J—ea 


= -[x^l - ^’(a:)l]r 

-b j f x'-^[l - F(x)} dx + [x'F(x)]l„ - j x’-^F(x) dx. 

The first and third terms vanish by the contact assumption. Then using 
(x — a -t- ay~^ for x’“\ 

1-1 

(33) y, = jH -f a’, j> 0 

t.O 

Also in the same manner 

MI = i S :-iC,(a - -f (a - mi)'» j > 

i-O 


(34) w = J Z ^-^C,[-Mo(a)]'M,_x_.(a) -b [-Mo(a)]’, j > 1, 

t-O 

using (29) Mo(a) = y[ — a. Letting o = 0, 

(35) y, = J > 0 

(3G) M; = i £ ,-i(7.(-M„)’i¥,_i_. + (-Mo)’, j > 1. 

1-0 

An interesting graphical property of F(x) may be seen from (35) j = 1 by 

taking /xj = 0. Then Mo = 0 and hence / [1 — ^'(a:)] dx = f F(x) dx. 

Jq J—60 

Thus the mean is that ordinate which equates the two areas bounded by (i) 
V = Fix), y = 0 and x = y'l and (ii) y = F{x), y = 1 and x = y[. ^ 

It is worth noting that the expressions (34) and (36) have the same coefficients, 
independent of a. This is to be expected because of the invariance of gy under 
translation. 

U a = y'l then (30) simplifies to Mj(yi) = . - I -yj+i . Lastly, expressions 

J + 1 

for a/s in terms of the M,(a)’s are given. 

_ 3M2(a) - 6Mi(a)Mo(a) + 2M2(a) 

[2M,(a) - M’5(ffl)P« ■ 



222 


IHVlXfi IV. IH'IlIl 


li,. n 1 


/q-.', ■l2.1/sfa)J/of<i) + y2M,(alM^ofn> ~ lij/* 

j-i 

[2j 1/,(«) - .1/=(fl)]>« 

I he discrete case lias been carried through in an exactly sinniar manner n , 
le use of finite rather tlian infinitesimal calculus. Only the re'-nlts will T' 

stated here '1 he notation u.scd i.s that of Steffcnseri 12). ’ ' 



M,(a) 


a 4- (r - ,, 




(38) 


rMfO 

1 







. t-iv“ 







4- W+ij~ 

■ DA)''*’' 

J > 

(39) 

Mg(a) 

~ il/fl “t* (1 





(40) 

M, 

" ±,CrJ'“M.M + <" + 






f-O 

J 4- 1 




(41) 

M,(a) 

= —\ r 

7 + 1 

k\ 

~*(/t - 

0)'-^ 

J > 0 

(42) 

Jlfo(a) 

= gi — a 





(43) 

j¥,(a) 

1 j+i 

= T 

j 4- 1 £o 


4- A - a)** 

J > ( 

(44) 

M, : 

= --L. V 
j + 1 

k^r k ! ' 

l}*+/+i_ 

j>0 

(45) 

*^0 ^ Ml 





(46) 

M,-. 

1 '+‘ 

~ in S ‘ 

t! ( 

‘(m( 4- A)*-', 

j > 0 

(47) 

/ 

M; = 


r)(a - 



li) = 

= [-il/„(a)]^ 





(48) 



1 y 






r«-( 


~r)(-~ 

■A/o(a) - 


(49) 

/ 

= 

^ n*—' 




(50) 

M, = 

(-W + giif. t h-^^-P2,cih 

>■-.0 ifcj 

“ r)[~- 

Mo - A]' 




CUMULATIVE FREQUENCY FUNCTIONS 


223 


The writer has verified that under certain fairly general conditions the dis¬ 
crete case (38)-'(50) approaches the continuous case (26)-(36) as /i —0 
The following three propositions are irteroly stated without proof since they 
follow so immediately from (23), (25), (31), (45), (21), (2) and (4). 

I^ROPoaiTiox 1: Given a rcI of functions Ffx) and positive constants 

k,i = 1, ■ ,nfor which ^ fc. = 1, then for F(x) = ^ k^F^x), M, = k, ,M, 

t«=>l laal 

if all the latter exist. 

T 

Proposition 2. In the ahove notation, if all the ,ii'i are equal, then = E k [ ,/Ij , 

1-1 

when the latter exist 

Proposition 3. If in addition to the above hypotheses, all the ctre equal, 
then 


(51) a:, = E •«) • 

These propositions are sometimes convenient in forming a linear combination 
of functions F{x), to obtain a function with desired properties. It may be noted 
that Proposition 1 is still algcbiaically tuie even with negative fc<’s, but these 
might give negative derivatives /(.t) for P(x) 

6, An algebraic function, F(x) = I — • This simple algebraic cum¬ 

ulative function w ill be discussed in detail. The a, can be calculated directly by 
the application of (23), (36) and (21). The resulting 0:3 and ai values cover a 
broad range, within which those of many empirical and theoietical distributions 
lie. A method of finding such cumulative functions with desired 0:3 and 0:4 
will be given. Several graduations arc presented for illustration. 

This function appears in Bierens de Haan [ 1 ] and has the desired propeities. 
The writer has not yet found a probability justification for the function, How¬ 
ever, since the a, are so close to those of functions which can be so supported, 
it seems that it may eventually prov'e to be at least some definite appro.ximation 
to a probability situation 

The complete definition is 


(52) 


F{x) = 1 


= 0 


_ 1 __ 

(1 -f .x')*- 


^ > 0 

X < 0, 


where c, k > 1 are real numbers. Tlie probability function 


(53) 


F'Cv) = /(.r) = 


(1 + X') 


i+i ’ 


c - 1 


Uc 


is iinimodal at x = 


fik 1 


ii c > 1 , and L-shaped if c = 1 . 



224 


iRViKO Av. nunii 


Use of (23) on (52) gives 
(54) M, = /" 

But from Bierens de Haan [1] 


rr+ a:')' ’ 


j < ck — 1. 


(65) 

where a'’'’* 
Hence 

(56) 


X 

(1 + a:')*^ 


i (1 

= a(a + c) • • ■ (a + r ^^c), 
(e~ j - 1)*"-"' 


il*-)!. 


(£_^- gr 
c*(fc — 1)! fein (gr/c) 


g < c, 




c’‘(k — 1)! sin v 
c 


i < c — 1. 


M, 


n/*-i 


(67) 




j_ + 

c 


ds 




However, if j > c — 1 then (55) can stiU be used tluougli reducing the exponent 
of X by x’~‘ (1 + x") — x‘~‘ = x’ . (56) is only good for integral values of L 
A more general formula is obtainable by letting (1 + x') = l/s. Then 

= -[*(!' s 

C Jq 

cr(ft) ’ 

for j = 0, 1, < ■. up through j < ck - 1, and c, k any real numbers >1. To 
determine the gy values the easiest way is to compute the values of the Af y 
by (56) or (57), and then to use (36): 

Mj = 2Mi - Ml, g, = 3iU, - GMiflfo + 2Ml, 

m = 4Jkf3 - 12AfsMo + 12i¥iM; - 3AfJ, etc. 

Having these, definitions (21) are used for the aj. 

The results for some integral values of h and c are given in Tables 11 and III. 
These computations were made from (56), Formula (67) shows that for a fixed 

c, il/, for A: + 1 is obtained by multiplying My for k by ^ ^ , This re- 

Icc 

cursion relation is very helpful in the computation, because it enttbles all of the 
values of the M/a for a given c to be found from tliose for the lowest value of k 
for which Mj exists. The values which need to be copied down in the com¬ 
putation for , <r, aa, ai, by a calculating machine are Mo, Mi , A/j, A/a. 
Mo , Mo , Mo , 6Mo , 12Mo , 12Mq , ga , cr, a , ga , aj , yn , a« . Because of 
heavy cancellation, especially in yxa and yi^, it seemed advisable to use eight sign!- 



CUMULATIVE FEEQUENCY FUNCTIONS 


225 


ficant figures throughout. Eight-place sines were obtained from Gifford [3], 
The values of the M, for Ic = 11 were also checked by eight-place logarithms 
[4]. These verify the values of the M, for fc < 11 because of the recurrence cal¬ 
culation. 


TABLE II 

Mean and Standard Deviation a for F(x) = 1 — 


(In each cell the upper number is u'l and the lower number is a) 



1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

1 

— 

1.57080 

1 20920 
97787 

1 11072 
.58060 

1.06896 

42266- 

1.04720 

.33552 

1.03438 

27953 

1.02617 

24019 

■ 


•2 


.78540 

.61899 


.83304 

30239 

.85517 

24794 

87266 

.21116 

.88661 

18433 

89790 

16375 

H 

.91498 

.13411 

3 

.600000 

86603 

.58905- 

.39118 

.67178 

29349 

.72891 

.24029 

.76965" 

.20461 

.79994 

17852 

.82328 
.15847 

.84178 

14253 

.85680 

.12953 

86923 

11872 

4 

.33333 

.47140 

49087 

.30393 

.59714 

.24784 

1 

.66817 .71834 

21077. 18344 

1 

.75550 

16234 

78408 
.14555- 

.80671 

13187 

.82507 

12053 

.84025 

11097 

6 


.42951 

-.25690 


.626411 .68242 
.192691 .17028 

72402 

.15220 

.76607 

.13743 

.78160- 
,12617 

.80216- 

.11487’ 


6 

20000 

24496- 

.38656 

.22488 

51088 

20220 


.65513 

.16103 

.60989 

.14505- 

.73447 

.13168 

76196 

.12043 

78432 

.11087 

80286 

.10266 

7 

.16607 

19720 

.35435- 

.20274 

.48250 

.18861 

.57029 

.17064 

.63329 

16403 

.68045- 

.13962 

71698 

.12731 

.74609 

.11682 

.76980 

.10782 


S 

.14286 
.16496 

32904 

.18599 

.45952 

17783 

.54992 
.16316 

,61520 

14846 

.66425- 

13528 

.70236 

.12383 

73276 

11394 

,76768 
.10539 

.77820 

.09796 

9 

.12500 

14174 

30847 
.17276 

■ 

.53274 

15704 

.59982 

14389 

66041 

.13171 

.68981 

12095- 

.72131 

.11156 


Hj 

10 

.11111 

.12423 

.29134 

16197 

.42407 

.16197 

.51794 

.15190 

58649 

.14002 

.63836 

.12869 

.67886 

.11861 

.71130 

.10964 

.73783 

.10168 

B 

11 

.10000 
.11055 

27677 

.15297 

.40993 

.15684 

50499 ! ,57470 

.14749; .13669 

1 

, 02772 
.12608 

.66916 
.11640 

.70240 

10780 

.7296^ 

.10021 

.75231 

.09351 


It udll be seen from Table II that in most cases the values of and on, lie 
within useful ranges. The graph shows the general relationship between aj, 
k and c. The curves are the traces of planes fc = 1, 2, ■ • • upon the surface 
<xi = G{c, fc). Other traces would contain all pairs (c, fc) giving a fixed aj. 














































2,Sn ,884 347 065 -.116 

10 17,83 4 122 3,043 2 883 2.028 

2,714 868 .329 .050 - 128 

11 16.-18 4.018 3.000 2 806 2.020 


The surfaces for /I'l , a and m lu-e more irr 
a cumulative function willi aj = n and 
determining a jioint of intenseclion of ll; 

( 58 ) “■ ' '>■ 

0(4 = //(fc, r), 

Direct algebraic solution of this sy.stom a 
nique.s mu.st be re.sorted to. 


BURH 

[II 


“1 -j 

I anti Ihc lourr number is on) 


6 

7 

B 

9 

10 

1 .S20 

14.77 

1.4,68 
10 30 

1 .22,6 

8,342 

1.060 

7 21.6 

.9.37 

0..610 

.134 

4.106 

.294 

3.859 

.190 

3.7.36 

,109 

3,073 

M4 

3.646 

.119 

3.3.68 

.00,6- 

3.329 

- 08,3 
3.343 

-.152 
3.,378 

-.208 

3.418 

-.010 
3.169 

-.125 

3.205 

-.207 

3 263 

-.271 

3.327 

-.325 

3 393 

-.097 
3 fH)8 

-.199 

3.165 

- 277 

3 243 

- 340 

3 324 

- 391 
3.401 

~ 147 
3.00.6 

-.210 

3.150- 

- 323 
3.241 


-.43,6 

3.416 

- 181 
3.0-18 

- 279 
3.144 

-.3.65- 

.3.2.14 

-.415 
3,.330 

-.465 

3.430 


-.303 

3.143 

-.378 

3.248 

-.438 

3.349 


- 220 
3.033 

- ,322 
3.143 

-.396 

3 252 


B 

-.242 

3.030 

-..336 

3 144 

-.410 

3.257 

-.470 

3.364 

-.519 

3.162 

-.25-1 

3.027 

- 348 

3 140 

— ,422 

3 201 

-.481 

3.371 

-.530 

3.470 


egulnr. The prolilem of determining 
1 = h i.s equivalent to the problem of 
le curvi's 


«3 = a 


a^ = 1 >, 


w veiy difficult, and other tech- 



































































(•TMW.ATIVK FUKQITENCV FUNCTIONS 


227 


One ninfhod to rmly integral values of k, and then for each k interpolate 
for the valtie of r giving tlie desired a,. For such pairs of c and k find a 4 by 
interpolation. Then idino.sing the pains having just above and just below the 
desired one, the proper linear eombination (51) is taken. This gives a combina¬ 
tion function which ha.s 1 mfh «, and a* at the desired values. This combination 



will be an approximation to the single function with non-integral k, having the 
given oj and «<. Tliis method of linear combinations might be extended to fit 
Ui by using three integral values of k. 

The interpolations may he done graphically by use of Figure 1 and others like 
it. Or one may use Stirling’s formula [6], The interpolation for c from a* 
is backward.^, while that for oa from c is direct. Sometimes it is more accurate 


228 


IliVIXR W. nfUK 


to use Newton’s formula (5, p. 3(1] when the values in one liirention increase 
lapiclly. 

Use of a single function F(.r) for a Rradnation is easily acc.oiniilishcfl. First, 
obtain the c and k to be used so that is correct ami is as close tti the Riven 
value as possible. Then determine fi[ and a from Tahh‘ II liy interpolation. 
Change the .scale and origin of the oriRinal value.s of tlie variable X to lhn,«e 
.c’.s coiTe.spondiiiR to F{x) - 1 — 1/(1 d throuKli 


(59) 


X - m _ X ~ M 


where M and iS are the mean and standard deviation of the given dustrilmtion. 
Now comiiutc the valiie.s of 1/(1 + U)* for the various value,s of x. The differ¬ 
ences of the.SG re.sults arc eciual to the differenee.s of F{t), wlne.h by (1) are. the 
probabilities for the given ranges of X. Multiplication by the total frequency 
will yield the. theoretical frequencie.s, if de.sired. 

If the graduation i.s to be done by a combination of two fiinetion.s, tlio work 
is carried out for each as dcsenbed aliove, and then the frt‘queneie.s an* eoiubincrl 
by the .same linear combination a.s that by whicli the. eomponent «* s must lie 
combined to give the de^sirc.d . This may rcuiflily be .seen by eon.‘'ideiing the 
separate cumulative funcAions in h'rins of the standard variable t, wlumee llui 
meatus and a-'s are 0 and 1 and (51) is appUcabh'. Tlien the (liffenuieeK of 
G{t) = kiGiil) -f kiGi{i) are sought. But lhe.«e ran lie found by taking the .same 
linear combination of the separate dilTerenecs of tlie functions (nit) and G'aft). 
However, these value.s are merely computed from their re.spee live .sets <tf x value.s. 

For illustration, tliroe graduations are gir'cn. Tlu* fir.st i.s a highly normal 
distribution of heights from Rietz [5, p. 98ff.). For this distrilmtion, df =s ,02085, 
S = 2.5723, aa = —.012-1, = 3,149. The graduation was iloiu* by taking the 


function F(.r) = 1 — 


which has the nearly normal eharaeteiisties 


1 _ 

(1 + 

fta == — 019, on = 3.1G9. The object r\as to take, a siiniile eunudative function 
with integral k and c to show liorv a satisfactory job can he done, on a normal 
distribution For this function = .75550 and a- = .1(5231. Then 


.r = .003110X + .75418, 
■11,5, 


•10.5, ete. From tljr.^e, 


into which aic sulistitutcd the X cla.s.s-limit.H 

8585 

covre,spending values of are calculated and difTereneeil to give the 

theoretical frequencies for the 8585 ca.scs. 'The re.sults are given in Talile I\', 
^Thc lit obtained by use of F(.r) is good. One comparison te.st is that of 
x". The eight elas,sos -11, —10, —9, 9, 10, 11, 12, 13 were grouped together. 
The results were 


x“- = 21.210, xv = 23.479, 



CUMULATIVE FREQUENCY FUNCTIONS 


229 


as compared to 

P(x- > 22.31) = .10, 

P{x > 19.31) = .20, 

for lo dcRicoh of freedom (18 classes minus 3 for linear restrictions). One 
reason for thts .somewhat lower for F(x) may be that its and on are closer to 


TABLE IV 


.V 1 

Ob&ervcd /rcquency |5) 

Graduated frequency by 
Pfar) 

Graduated frequency of 
normal [5] 

-11 0 


00 

16“ 

-10.0 I 

2 

.43 

67 

-0.0 I 

4 

3 23 

2 84 

-8.0 

14 

13.17 

10.30 

-7 0 I 

41 

39.81 

32 11 

-0.0 ! 

83 

97,87 

86 03 

-,L0 ! 

100 

200,72 

198 17 

-4 0 


385.3t 

392,43 

-,'i (1 1 

060 

030 55 

668.11 

1 

090 

941 98 

977.92 

-1.0 

1223 

1210.47 

12,30 63 

.0 

1320 

1353,08 

1331 41 

1 0 

1230 

1278 39 

1238 41 

2 0 

1003 

1013 SO 

090,33 

3.0 

MG 

076 12 

[ 680 86 

4.0 

■■■■ 







0.0 

' 70 



r.o 

32 



8.0 

10 


10 84 

0 0 

5 

5 33 

3.01 

10.0 

2 

2 01 

72 

11.0 


.77 

,15 

12.0 


.30 

03 

13,0 


10 “ 


Total, 

i 85.S.'; 

8585,00 

8684 99 


those of the ohsei'vcd distribution than are tha.se of the normal function. This 
Kives a hetter fit in the laihs of the distribution. Nevertheless, this example does 
illu.strate how one of the .simplest of the cumulative functions with “normal” 
characteristic’s can be used without sjiecifically fitting as and ai . It mav also be 
mentioned that F{x) for c = b, k = G has as and a/^ even closci to the normal 


’ Total of stump frctiuency. 










TABLE V 


x 

Observed 

frequency 

PM. i - 4 
c "• 3.238./i 

F(x), A M S ! 
c « l,9Ui/i 

F(x) i 

Type III (SI 

-8 0 

3 

.00 

00 

tx) i 


-7.0 

9 

so 

(X) i 

27 ■ 

2 

-6 0 

la 

39 58 j 

25.07 1 

2n.r)2 ] 

27 

-5 0 

167 

l.SO 78 ! 

175 27 i 

170 90 ; 

M2 

—1.0 

372 

433 SO I 

415 79 

442.1.3 

410 

-3 0 

718 

768 83 

791.72 i 

781 71 ! 

700 

-2 0 

1186 

1110 06 

1131.52 

1128 ,80 ; 

1180 

-1.0 

1162 

1383.00 

1384.99 

1.381.40 j 

1441 

.0 

1498 

1492.04 

1477.80 

1482.20 

1502 

+1,0 

1460 

1419.70 

1399.70 

140,5 ,83 j 

138.5 


1142 

1205.81 

1100.80 

1195.40 j 

1158 


913 

920 59 

921.47 

923.01 ! 

SOI 


642 

654 00 

650.82 

065.90 ! 

041 


435 

430.66 

436.78 

434 90 

434 


235 

268 70 

274.63 ; 

' 272.81 1 

1 j 

280 

7,0 

167 

101 10 

165.27 ! 

i 10.3.90 1 

173 

8.0 

133 

93 90 

90 23 

: 9,5.55 

1 102 

9.0 

47 

53 88 

51 77 

54 50 

1 09 

10 0 

29 

30.02 

30.70 

30.08 ' 

.33 

11.0 

13 

17.,37 

17 07 

j 17 10 j 

1 18 

12.0 

9 

0.80 

0 40 

j 9,58 1 

! 9 

13.0 

5 

5.04 

6.20 

6.3S 

6 

14 0 

8 

3.20 

2.03 

. 3.03 

1 2 

15.0 

2 

1.89 

1.60 

1.73 i 

1 1 

16 0 


1.12 

.93 

* .09 1 

i 

17.0 


.06 

.63 

! 67 


IS 0 


.41 

.31 

1 .34 


10.0 


.21 

.18 

.20 


20.0 


.16 

.11 

.13 


21.0 


27< 

.17* 

.20* 


Total . , 

i 10701 

10701 00 

19701.00 

1 10700 99 

10701 


TABLE VI 


Obaerved (0] 

Type nr [o| 

Type A|6i 

Edgeworth [0| 1 

PM 

3 

4 

6 

4 

4 

20 

17 

22 

17 

19 

38 

42 

47 

42 

42 

63 

60 

80 

59 

50 

51 

63 

SO 

63 

62 

29 

33 

27 

32 

34 

21 

15 

13 

IS 

10 

4 

5 

4 

6 

6 

0 

1 

1 

2 

1 

1 

0 

0 

1 

0 

230 

229 

220 

231 

2 M 


4 54 

7.65 

5 86 

4.03 


Stump frequency, 


230 







C'CMULATlVi: FHEQUEXCY FUN'CTIONS 


231 


valiK's, l)ut it {l()(‘s noi give quite, as good a fit l)ecau.se it tends to decrease too 
rapidly on the left. 

Tlu' seeoml example is also from liietz [5, p. lOSff.]. For this distribution, 
M ” .(1833/), -- 2.!) 180, rej = .583 and otj = 3.G98. Two functions were used 

with k - 1 and k - 5 By interpolation 


A - .5 

c 2.911 

Ml 

.,5-1200 

ff 

.22247 

.583 

3.655 

k - -1 

c = 3,228 

.01577 

.23823 

.583 

3 795 


Beeau.se of the rather rapid increiise.s for smaller values of c, Newton's formula 
[5, p. 30] yields better api)roxiniation.s than Stirling's [5, p 38 (12)]. The gradua¬ 
tion for each fuuetiou is carried out as above, and .since 

.3003-3.795 -|- .0937-3.055 = 3.098, 

the linear form 

.3003/] + .G937/i = /‘. 

i.s used. 

Table V gives the component and eombmcd frcciuencies, and also the fre- 
eiuencics from a 'ryiie III. x' l^oth are very high even though the fit appears 
reasonably good on a graiih. This rc.sult is due to clns,ses 0 and 8 which tend to 
cau.se a higli x* f<»r any di.stribution function of a .small number of parameters. 
Tlie example, however, docs .show that F(x) can be used to graduate a skewed 
dlstiibution 

It i.s to be further noted that the component functions were used only to 
obtain an approximation to a .single, function with -1 < b < 5, for which aa and ai 
are .siiuuUancou.sly correct. When tables more complete than Tables II and 
III are available, .such a .single, function can be found 

The, tliird e.\aniple of graduations is from Elderlon [6]. The measures were 
treated a.s a di.screte variable in computing aj and at. A single function 
c = 3.102, & = 11 wa.s u.sed, This function had oo at the observed value of 
.2930, while ai wa,s 2.973 as compared to the observed 2,986. The results^along 
with those by claKsieal methods are .shown in Table \I. The above x were 
obtained by grouping th(‘ first and the last three cla.ss frequencies. The values 
are appro.xiniale because of rounding. However, they do show that F{x) does 
a comtiai'alile graduation. 

Be.sides aiding in the jirolilem of graduation, thi.s cumulative function .should 
prove of value in the uiiiiroxiiiiatiou of known or population distiibutions, as 
foi- example, (p + q)". However inueli more work needs to be done More this 
eati be more than a conjecture. 

7, Conclusion. This ))apcr has stressed the advantages obtained by the direct 
use of the cumulative function. A number of useful functions have been 
con.sidcr''d. A general method for fitting any cumulative function by the 
construction of a table has been .suggested. A particular method depending 



IltVIXG \\\ DHIili 


















NOTES 

Thin ni'Hion m dnuihd b\ I,rvJ ustarch awl vxpiminry articks, nolen on methodology 
and other short itnns. 


AN APPROXIMATE NORMALIZATION OF THE ANALYSIS OF 
VARIANCE DISTRIBUTION 


By Kdwahd PauljsOn^ 
Columbia Uniirroily 


The stafielic F - , where 4 anti n’ are two independent cfttimates of the 

•sanu' vaiiaiiee, ha*^ playt'd an e.-">ential part in modern statistical theory, All 
tests of .si|i;niiu'anee involvint; tin* te.stinp; of a linear hypothesis, which ineliules 
the analysis of variatiee and eovananee and nndtiple rcKi'cssion problems, can 


lie reduei'd In htitliriK the inohahility inteRial of the F distribution, This 
distrihntiou fainl the et[Uivali‘iit distiilmtion of c = ^ log F) has so far liecn 
direetly taliulateil only ftir the 20, ii, 1, and 0.1 iiereent levels of significance [1] 
To find lh(‘ critical vrdue of F for .mhiu! other probability level would require 
tilt' use of Pearson's extensive triple-entry tables [2], which is not veiy con- 
venicuit to use foi this {)urpo,‘'e, and in addition i.s inadeciuatc for some ranges 
of the iiaraiiieter.s. 

It therefore appear.s that it might be of some practical value to ha\e an 
approxiiniite method of determining the critical value.s of F for other probability 
levels. A solution will he given based on a modified statistic U, a function of F, 
bO .belfi(tU*d tift to tend to have a nearly normal distribution ivith zeno mean and 
unit variance. 'J'his normalized statistic will have tha additional advantage 
that further te.sts arc possible with normalized variates, as pointed nut by 
Hotelling and Frankel [3]. 

F can be written in the foun 


F = 

xl/n,' 

where, xi and xl have the chi-btiuarc di.slribution with ni and via degrees of freedom 

re.bpectively. It is knusvn from the work of Wilson and Hilferty [4] that 

is nearly normally distributed with mean 1 — 2/f)a and variamie 2/9a. An 
obvious approach to the problem of securing an approximation to the F distribu¬ 
tion is to regard F* as the ratio of two nornmlly di.strilmtecl variates. In general 
tile distribution of tlic ratio r = y/.r where y and x are normally and inde¬ 
pendently distributed with means 7 m„ and nix and standard deviations ir^ and a-x 


> Work (Itiiie uinlor ii grant-in-nid from the Carnegie CmpoiiUnm nf Xc^^ York 

233 



234 


ED-yVABD P.VrLSON 


is not expressible in simple form. However Fieller [5] has shown that a function 

R of V, namely fB = ~~~ J”- be nearly normally di.strihuted with zero 

+ crS 

mean and unit variance, provided the probability of x being negative is sinall. 
In the given problem it follows that we can regard 


( 1 ) 



as nearly normally distributed (with zero mean and unit variance) providerl 
na > 3, for with = 3 the probability of the denominator of being negative 
is only ,0003, If it is desired to use the lower tail of the F distribution, then the 
statistic V should only be used if tti is also > 3, Ordinarily, in moat applica¬ 
tions only the upper tail of the F distribution is used, and -its , which corresponds 
to the number of degrees of freedom in the estimate, of the error variance, will 
be much greater than 3. 

The following tables show the dcgiee of accuracy of the approximation. The 
exact value of F corresponding to variou.s levels of significance are compared 


p 

n, J, 

Approximation 

ni ’» 10 

VF 

Exact Value 

.20 

1.37 

1.37 

.05 

2.21 

2.23 

.01 

3.10 

3.17 

.001 

4.03 

4.59 

.0001 

0.40 

0.22 



Til " 4, 

na *= 8 < 

1 

■Ml “ 6, 

Tlj “ 12 

P 


f 1 

F 



Approximation 

Kxoci Valui! i 

Approximation } 

Exact Value 

.99 

.058 

.008 1 

.123 j 

.130 

.95 

.101 

.100 ! 

.248 ^ 

.250 

.80 

.407 

.400 ! 

.497 ; 

.490 

.20 

1.92 

1.92 1 

1.72 

1.72 

.05 

3.84 

3.84 1 

3.00 

3.00 

.01 

7.12 

7.01 

4.85 

4.82 

.001 

15.38 

14.39 

8,58 ; 

8.38 



distribution of roots of tolynomial 


235 


with the approximate values, which are found by solving (1) for F by considering 
it as a quadiatic equation in F*. In these tables P = f <p{F) dF, where <p{F) 

J p 

is tlie probability distribution of F. The case nj = 1 is of special interest, 
since here F - t^, where t has Student’s distribution, and is shown separately 


1{]0FKRK^+CES 

(1] H, A. I’lsiiEK nud F A atkh, ^lalislical Tables for Biological, Agricultural, and Medical 

Research, Lnndon, 1938. 

(2) IVAiiL I’KAlisoN (Editor), Tables of the Incomplete Beta Function, Biometric Laboratory, 

London, 1934 

[3| n Hotki.mno and L It. Fuankel, “The transformation of statistics to simplify their 
diKtrilnition." vlnmdB of Math filal , Vols 8-0 (1937-38), pp. 87-96. 

(■1| E. B IViL.soN and M. M. IIii.i’Ein'Y, "The di.sttihiitioii of chi-squarc,” National Acad 
Sc Proc , Vol, 17 (1931), pp. 684-688 

[61 K (' I'u.ta.UK, “The distribution of the index m a normal bivaiiatc population," 
Binmclrika, Vol 24 (1932), pp 428-440. 


NOTE ON THE DISTRIBUTION OF ROOTS OF A POLYNOMIAL WITH 
RANDOM COMPLEX COEFFICIENTS 


By M. a. Girbhick 


United Slates Department of Agriculture 


In order to obtain the distribution of roots of a polynomial with random 
complex coefficients, it was found convenient to employ a rather well known 
theorem on complex Jacobians. Since proofs of this theorem are not very 
plentiful in the literature, a brief and simple proof of it is presented in this note 
Theorem : Let n analytic functions be defined by 


(l) U)p —' Up ”i~ ivp — fpift\, Zz, ■ ’ * , ^n), (p Ij 2, ''' , n), 

where Zp = Xp + iVp , i = s/"!. Let j denote the Jacobian of the transfor¬ 
mation of the n complex variables defined by (1). That is 


divi 

dWi 

dzi 

dZn 

dWn 

dWn 

dzi 

dZn 


Let furthermore J denote the Jacobian of the transformation of the 2n real variables 
defined by the equations Up = Up(xi, Xt, ■ • ■ , Xn ; yi, yt, ■ • • , Vn) and Vp = 
, • • ■ , ain : I/i, ^ 2 , • • ■ , Vn), (P = 1) 2, • • • , n). That is 


( 3 ) 






236 


M. A. GIR 8 HICK 


5ui 

dUi 

3a: 1 

dXn 

3u„ 

dUn 

i axi 

ax„ 


r’. = 


dui 

dv. 

dx, 

3x„ 

3u„ 

di>„ 

dx. 

" dx„ 


I dUn _ _ dlU 1; 

'i dyi aj/„ J 

, dvi _ dui {; 

i ai/n I; 

= : .! '. 

^ t 

'} dyi aj/„ ii 


Then J equals the square of Ihe modulus of j. 

Proof: Since by hypothesis Wp is analytic we can act == — i 

Hence j takes on the form; 


3z, ay. 


i = 1 F„ - iUy |. 


Again, since Wp is analytic, wc have f. That is 

dx, dy, dx, ay, 

Ux = V, and Fx = —Uy. Hence J in (3) has the value 

Vy Up 

(5 . 

-Vy Vy 

Now J can also be written in the form 

iUy Vy I 

This follows from the fact that if we multiply each of the last n rows of the 
expression for J in (6) by i and factor out i from the last n columns, we get the 
expression for J given in (6). 

Now in (6) subtract the {n + p)th row from the pth row for each p = 
1,2, •••,«. This yields: 

iu, y. 

Next add in (7) the pth column to the {n + p)th column for each p « 1, 2, • • ■ , n 
This yields: 


V-iV 0 ! 

iU V + iv' 


= 1 F - tLMI F + ic; I 







DISTRIBUTION OF ROOTS OF POLYNOMIAL 


237 


But (8) is precisely the square of the modulus of | 7 - iV |. This in con¬ 
junction with (4) proves the theorem. 

Consider the equation 

(«) 2" " aiE"-'-h = 0, 

where the a, are complex numbers. We may wish to consider the real and 
imaginary parts of a^ as random variables having a given joint distribution 
function, and require to find the probability that one or more roots of (9) Avill 
lie in a specified region of the complex plane. In order to answer this question, 
it is necessary to find the joint distribution of the real and imaginary parts of 
the roots of (9). 

As an example let us assume that the real and imaginary parts of a, are nor¬ 
mally and independently distributed with zero mean and variance <r®. That is, 
we assume that the distribution density of these quantities is given by 



where a, is the conjugate of Op . Let , zj, • • • , be the roots of (9). The 
relationship between the roots and coefficients of (9) are given by 

(11) Oj = **>•'• . 0„ = Zi23..-Z„ 

/«•! J<fc 

Thus the Op’s arc analytic functions of the z’s. 

In order to find the joint distribution of the real and imaginary parts of the 
z's, it is necessary to find the real Jacobian J of the transformation defined 
by (11). Now the complex Jacobian j of the transformation (11) is defined as 

dai dai 

dzt dz„ 

( 12 ) ;• = . 

da« da„ 

dZi dz„ 

A simple calculation will show that the value of j in (12) is given by 

(13) J — S izp — z^, 

pmm\ 

Hence, applying the theorem proved above, we get 

(14) J — \j\ — ^ 5!^ I 2p ~ 2(1 I 1 
where the symbol || stands for the modulus. 





238 


Hit,DA OKltllNGKH 


From (10) and (14) we conclude that the joint dintrihution density t)f the real 
and imaginary parts of the roots of (9) is given iiy 


(15) 



}^\ 


+ 2i2) 


JJ 


z z 


A. NOTE ON THE PROBABILITy OF ARBITRARY EVENTS 

Bv Hilda GErRiNOEid 
Bryn Mam College 

In a recently published paper [1] on arbitrarj’’ events the author studies the 
probability of the occurrence of at least m among n events. Denoting by 
Pmiyi, ya, • • • 7 r) the probability that at least m among the r events, Ey^ , 

■ • • occur, and by the probability of the non occurrence of the 

events numbered ai, aj, • ■ ■ cKr and of the occurrence of the n -- r others, he 
pioves 

-pi(ar+l (•••««) + £ Pl(ri I ar+l» • • • Otn) ~ S Z) Pl^Tl . fs I «M I , • ’ ’ «n) 

(I) ■'* 

+ ••• + (-I)' Z piO. ’" «) pi «,..-«,]■ 

(Theorem VI, page 336). From (I) he deduces that o mcesmry and suffirimi 
condition for the existence of a system of events Ei , " • E„ asHOciatod with 
given values h (m i • * • «*) is that the expressions on the leftside of (I) computed 
from these i's are ^ 0 for all possible combinations of tlie «'« (Theorem VII). 
He also points out tliat it was not possible to find similar (necessary and suf¬ 
ficient) conditions for m 1. I wish to show in this note the relation between 
these theorems and some well known basic facts of the theory of arbitrarily 
linked events and to add some remarks. 

1. Given n chance variables x, (z = 1, • • • n) denote by xj == 1 the "oc¬ 
currence of E”, by X, = 0 its non occurrence and by a(Xi, Xj, • • ■ a;„) the 
probability of "the result (xi, X 2 , • • > Xn}" i.B., the probability that the first 
variable equals xi the second xj, • • • the last x„ ; e.g. i)(l, 1, 1, 0, • • * 0) « 
is the probability that only the three first events occur. Hence the 
v's are 2" probabilities, arbitrary except for the condition to have the sum 1. 

Instead of these v’s we often introduce another set of 2" — 1 probabilities, 
namely p. the probability of the occurrence of iJ, (t = I, • • ■ n); p,-/ that of 
the joint occurrence of E, and Ej {t, j = 1, • ■ • n); • • • the probability 
that all the events occur. 


’ lloacaroli under a graiit-iii-nid of the American Phdosophical Society. 



ARBITRARY EVENTS 


239 


It may be noted that instead of the p,, p.y, • ■ • pij, we could quite as 
well use a system of g., , ... „ where q, is the probability of the non¬ 

occurrence of E, (or of the occurrence oi E', = B - E,, g.y that of the joint 
non-occurrence of E, and E, (of the occurrence of E'.E,) and qn. the probability 
of E^Ei ■■■ En . 

Ihe use of the p’s (or g’s) instead of the “elementary probabilities” v is 
justified by the fact that the p’s are (2” — 1) independent linear combinations 
of the v’s and that therefore the v‘s and the p’s (or the v’s and the q’s) determine 
each other uniquely. There exist in fact the following well known relations, 
(1) and (2). The first set (1) gives just the definition of the 2’‘ - 1 probabilities 
p, in terms of the ti’s, and the second set expresses the r’s by the p’s as the result 
of the solution of the 2"^ — l independent linear equations (1). Thus we have, 
beginning with pia..,rt . 

P)2..,„ = u(l, 1, ■ •. 1), 


Pn. .n_i = ^ y(l, 1, ■ •. 1, a:„), 



P 12 

= 2 



*3 


pn 

= 2 - 

• • 2 y(a:i, X 2 , 1 ), 

®B-l 


and solving .successively: 

i)(l, 1, • ■ • 1) = pij...„ 

t*(l| 1) **' 1| 0) = Pl2.. n-l Pl2. . 11 , 


(2) 


i'(ii ii fi) ' ■ ■ 0) — pii — XI piiTi + 2 X Puiros 

Tl Tl Tj 


• • ± Pl2 n , 


t'(0, 0, ■ • • 0, 1) = p„ - X Pti" + X X PnTjn - • • • 

yi Yj 

^ 2 ' ■ ■ 2 PtiTj" t'b-iH ^ Pl2 . tl • 

71 7n-l 

The .succes.sive solution of the system (1) with respect to the "unknowns” v 
is possible because each new equation in (1) contains exactly one new unknown u; 
c.g. in the equation defining pw the only “unknown” is y(l, 1, 0, 0, ■ • 0) all 
the v’b with more than two “l”s having already been computed from the fore¬ 
going equations. 

If we choose to use the system of the q's we have in the same way: 
qn ..« = «(0, 0, • • ■ 0), 

— ^ j * * ' ^ > y(Xi j X 2 , ' ’ ' } 0), 


(10 


q. 








240 


HILDA. GEIBINGER 


and the inverse system 

!'(0, 0, ■ • 0) = qu 

(2') ~ ~ "" 

ll(l, 1, ■ ■ ■ 1, 0) = 9n Qri" d” 2!) IS i?TlTa" ’ ‘ ^ (?19' ■ -n ■ 

11 11 Tj 

Coming back to Chung’s theorem wc sec that the probability pi(ai, - • • ar) 
that at least one event among i5„j , • • ■ occurs is evidently: 

(3) pl(o!l > ■ ■ ‘ m) = 1 qa^,,..ar ' 

If we inlroduce this value in (I) all the “l"s introduced by (3) cancel and we 
get our system (2'). (Of course we could in the same way deduce from (2) a 
system of etiuations for qi(oti, • • • ctf) = 1 — where qi{ay, ar is 

the probability that at most r — 1 events among the r given ones occur.) 

As the y-value.s on tlic left .side of (2') are 2" — 1 independent probabilitie.s 
only subject to the restriction that they have the sum g 1, we .sec that the 
expressions on the right side of (2') must have the same pToprrlies] and these 
proiicrties are also sujfictenl for a system of g’a, (or for a sy.stom of 7 Ji(ai, - • • 
ttt) \ indeed if they are fulfilled, the.se 2" — 1 cxpre.s.sion.s define by means of 
(2') a system of elementary probabilities v(xi, ••• .t„). Hence the theormn.s 
VI and VII quoted at the beginning of this note are ratlier close nonseciueue.es 
of the basic relations of the theory of arliitrary events. 

2. Remark 1. We may add one more equation to equatiorw (1), namely 

1 = S ••• ••• »"), 

n *i> 

thus introducing y(0, 0, 0). Then in system (2) the corresponding new 

equation will be 

y(0, 0, ■ • • 0) = 1 - 2!) + £ E Viiii - ■ ■ ’ ± pis- -n 

y\ yi 73 

(and analogously for the q’s). In this way we get two .system.s (I) and (5) 
each consisting of 2" equations and in (2) the sum of the expressions on the right 
side is now identically equal to one. Hence necos.sary and sufficient conditions 
will now be that all these 2" caiprcssions must Ik; non-nepatice. 

Remark 2. It is convenient to interpret or prove ro.sults of the kind con¬ 
sidered here in tmn.s of elementary measure theory: pi is tlie measure of a .set 
El ; pi of Ei 1 pii that of the intersection EiEi etc., and analogously for the 
v’s: e.g. y(l, 1, 0, 0) = m{EiE%Ei /i’l). Consider now the equations 

(2). The first is an identity. lu the second pi 5 ,,,rt-.i measures the product of 
EiEi ■ • • £l„_i, whereas pn. .„_i — pn...n is the measure of that part of this 
pioduct which doc.s not belong to E„, and it therefore equals m(JBiB% • • • 
E,,-iE\,) = y(l, 1 • ■ ■ 1, 0). In the last equation (2) E • •' E) P-n. .T«-in is the 

n 7n-i 




ARBITBARY EVENTS 


241 


measure of that part of Bn which belongs to at least (n — 2) other sets (besides 
Bn) ] whereas this same value minus pn, „ is the measure of the part of Bn 
which belongs exactly to (n — 2) other sets; now subtracting this expression 
from 22 ■ ■ ■ Pti Tn -3 we get the measure of the part of Bn which belongs 

ri Tn-J 

exactly to (n — 3) other sets and finally Pn — ■ ■ • pn.. n is the measure of 
that part of which belongs to no other set besides, i.e m{E'iEt ■ ■ ■ En-iEn) = 
v(0, 0, ■ • • 0, 1). This kind of proof does not require the solution of (1). 

Remark 3. According to (1) the Pi, p,,, ■ • • pu .n are the ordmary mo¬ 
ments of order 1, 2 ■ ■ ■ n of v(xi, X 2 , • • a:„). There are of course many 

more than 2" — 1 moments of this n-variate distribution but only 2" — 1 of 
them are different from each other because 1’’ = 1 

3. Denote by pn(x), (x = 0, 1, • • • n) the probabthiy of getting exactly x suc¬ 
cesses in n trials. (See e g. [2], [3].) For the simplest case of arbitrary events, 

the Bernoulli problem, p„(a:) = — p)" *. Then the probabiliy of 

at least x successes (of a number of successes £ x) is 


(4) 


Fn(a:) = Vn{x) + p„{x + 1) + • • • p„(n), 


or pi(l, 2, ••• n) in Chung’s notation The p„(x) are by their definition 
(n + 1) arbitrary positive numbers with sum equal to one. These are the only 
necessary and sufficient restrictions for p„(x). V„(x) the "cumulative” distri¬ 
bution of Pnix) which is defined for x between (— oo and -|-do) is a monotone 
non-increasing step function with its (n -f 1) steps at x = 0, 1, 2, • ■ • n equal 
to the pn(x). 

Consider next p*(aci , <X 2 , • • • Ur) where r < n; these are cumulative dis¬ 


tributions each corresponding to one of the 



probabilities pr(x) where Pr(x) 


is the probability of exactly x successes in a group of r trials.’^ For each group 
(cKi, • ■ - ar) the cPrresponding Pr(x), (x = 0, 1, ■ • ■ r) are positive and with 

0 r 

sum equal to one. Hence if we always omit p,{0) because of 22 Pr(x) = 1, 


X 


all the different pi(x), p 2 (x), • ■ • p„(x) together define 


1-n + n(n - 1) •+■ Q (n - 2) -H • •' + n = n2''-^ 

values. As n2’'~^ > 2" for n > 2 we realise that between these n2"“‘ prob¬ 
abilities there must exist a set of n2’’”‘ — (2" — 1) identical relations', and the 
same is true for the corresponding cumulative disti'ibutions Fr(x) or px{ai , 
• • ■ ftp). Thus it seems reasonable that it may be hard to use these pi(o£i, 
■ • • ar) in the characterization of a problem of arbitrarily linked events if 
X > 1. On the other hand we have seen in 1 that for x = 1 they reduce to the 


•One may write here pr(®) instead of pioi.ai,-- ,ar)(i) 



242 


HILDA GBIEINGEri 


2 " ~ 1 probabilities q,, ■ Qk n which of course define the system of 

events unequivocally, 

4. Introduce in the usual way the sums of the 71 ,, p,/, etc. 

(5) Si “ pt j S» — j Po ) ‘ ” Pi 2 ,-,n) and ^0 ~~ 1 • 

* 

Now add in system (1) fiiat the n equations which define jh , p 2 , • ■ • Pn , 

then the ^ 2 ^ equations for the p,j, etc Observing that p„(a-) is tiu' sum of 

all these elementary probabilities v(xi Xi • • • Xn) with exactly x “l"s and (n ~ x) 
“ 0 ''s we get os the result of these n additions the well known formulae; 

(0) Sy = 2 Pn { x ), (7 = 0. 1> ■ ■ • n ). 

n 

Here 7 = 0 gives (So = I = 2 Pnix). We may solve succcs.sively these (n + 1 ) 

0 

linear equations with respect to Pn(n), p„(n. ~ 1 ), Pn(()), eadv linear c‘qua~ 

tion containing only one new unknown, and 'ind: 

(7) vM = t Q 5 ,. (ar = 0 , 1 , ■. • H). 

(These formulae could also have been derived from (2) by collecting group.s of 
equations such that all the corresponding v{xi , ■ • ■ j„) contain the same 
number of (In the measure interpretation p„(x) is the measure of that 

part which belongs exactly to x of the original .sots and (S^, measure,s the set winch 
belongs to at least 7 of these acts.) We also find by '‘cumulating” equations (\)} 

( 8 ) 5, = E r„(x), (7 = 1 , 2 , .. • n), 

and the inverse system 

(9) V%(x) - i: (- (j. I J) -S., (.t = l. 2 , . ■. n). 

( 6 ) and ( 8 ) are of the same type as (1), and (7) and (9) of the same as (2). We 
also may deduce analogous formulae by interchanging tlie roles of 0 aati 1 and 
introducing a system of Ti, Ti, •' • which depends on the f/'s in the same 
way as the Si, Si, ■ ■ • Sn defined in (5) depend on the p’s. 

Wo have seen that tlic p„(x) arc (n +■ 1 ) arbitrary non-negative numbers 
subject to only the condition of having their sum equal to one. But the Sy 
(7 = 0, ■ • • n) are not arbitrary as wc see from (7), Tlie (n + 1) expre.SHions on 
the right side of (7) 7nust each be nournegahve if they are to define the probabilities 
PnCx) (their sum is identically equal to one). Then and only then they define 
a system of arbitrarily liked events Et, • En , 

The Pn(x), (x = 0 , 1 , ■ 7 i) are of course not equivalent to the complete 



ARBITRARY EVENTS 


243 


system of 2 " — 1 values u(a:i, ^ 2 , • • • a:„) and the same remark holds for the 
(So, • ■ ■ Sn and the system of Pi, p, 3 , • ■ • pn . But often we are par¬ 
ticularly inteiested in problems dealing only with the Pn{^) (and S-,). (For 
instance the author has studied [ 2 ] the asymptotic behavior of Pn{x) as n tends 
in different ways towards infinity.) The simplest way to indicate a particular 
p-system corresponding to given Sy is of course to assume all the p, equal to 
each other, all the equal to each other etc. and to put therefore: 

Pi = P2 • • • = Pn = - (Si , 
n 



In the corresponding ti-system all these a’s which show the same number of 
“l“s equal each other. 

We see from ( 6 ) that the Sy (multiplied by 7 !) are the factorial moments of 
order 0 , 1 , • ■ n of the distribution Pn(x). Therefore by (7) we get the p„(x) 
in terms of their factorial moments up to order n. We may therefore also say; 
Necessary and sufficient conditions that a system of numbers No = I, Ni, • • • Nn 
be the factorial moment of an arithmetical distribution with at most (n -f- 1 ) steps 
at X = 0 , 1 , • • • n are the inequalities, 

( 10 ) Z --r.J/. ^ 0 , (x = 0 , 1 , . •. n). 

Note that here there is no more allusion to a set of arbitrary events; (10) are 
the necessary and sufficient conditions for a set of (n + 1 ) numbers to be the 
(n -f- 1 ) (factorial) momenta of an arbitrary arithmetic distribution with its 
abscissae given. The linear inequalities (10) differ very much from the basjc 
inequalities in the classical problem of moments; because in our problem the 
abscissae of the steps are given in advance. 

6 . In some problems (e.g., some questions connected with the law of large 
numbers, with correlation theory, with analysis of variance) we are only con¬ 
cerned with the first and second moment of a distribution. Thus we are lead 
to the following question: Given r + 1 numbers JVo, Wi, ■■■ Nr, (r ^ n) 
indicate a set of necessary and sufficient conditions such that these numbers 
are the moments of an arithmetic distribution with at most (n -|- 1) steps, at 
0, 1, 2, • • ■ n.® Some sort of an answer which may work well in particular 
cases, can immediately be deduced from ( 10 ). “r + 1 numbers No, Ni, 
•“ Nr ■will be the factorial moments of an arithmetic distribution with, at 
most, (ft + 1 ) steps at 0 , 1 , 2 , • - • n if and only if it is possible to indicate s 

*'ThiB problem and the method of its solution has much in common with a problem 
studied in R. von Mises’ paper [4]. 



244 


HILDA GEIRINGER 


numbers A^'r+i, • • ■ 7V,+s, (0 ^ s ^ n — r), such that for the r + s + 1 num¬ 
bers No, Ni, • ■ • Nr, ■ • • Nr+, the r -f- s 4- 1 inequalities 

(i: = 0, 1. r + s) 

y^x KI (7 - a;) I 

be satisfied,” 

The proof of this statement is self evident but the statement itself cannot 
be considered satisfactory We get a general solution in the following way. 

Let/i(i), ■ • • frit) be r functions of the chance variable t, v{i) an arithmetic 
probability with n given attributes k, h, • • • k and 

(11) EifM - ilUtMiy) = Z Oy.Vy = S,, (p = 1, 2 . • ■ • jO. 

1 7-1 

the expectations of ff(i) with respect to a(i). We wish to indicate nece.ssary 
and sufficient conditions for the r numbers . For /,(t) = t'’ \ve. have the prob¬ 
lem stated above where the first r moments are given. 

Call (iS) the r-dimensional curve Xp — f,{l) and Pi , Ik , ■ ■ • Pn the points 
on (<S) with coordinates fp(ty) - Or,, (p = 1, • • • r; y = 1, • ■ • n), E the given 
point with coordinates Sp. In this ca.se, the point S mud be conlained in (he 
smallest convex body {B) delermined by the n points Pi, • • • P„ . This condition 
is necessary and sufficient. Because, if wc interpret the Vy which are ^ 0 aa 
masises of the points Py , with sum equal to one, then B i.s the center of gravity 
of these mosses and it is well known that the above mentioned condition for S 
has to be fulfilled. But this condition is also sufficient, becau.se if B is contained 
in (5) there exists alwaj's a simplex of at most r dimensions, consisting of at 
moat (r + 1) of the given points such that S is the center of gravitjf of appro¬ 
priate masses in these points. 

If we want to indicate explicitly the inequalities for the Sp we must know 
the boundary of (B). Thus is determined by its planes of support (‘‘Stiltz- 
ebenen,” Minkow.ski) .sometimes called tack planes. A tack plane is a plane 
which does not separate any tu'o points of the given point set and contains at 
least one point of this set. A plane i.s said to separate two points if, when the 
coordinates of the points are written in the equation of the plane two values 
with opposite signs result. These definitions enable us to find those points Py 
which lie on the boundary of (B) and to determine this boundary, (E.g. for 
r = 3 we have to find such triple.s of i, k, I, that the determinant which represents 
the equation of the plane through these three points has the same sign for all 
possible other points . If the Sp are the first tliree moments with respect to 
the origin, these determinants become Vandermond determinants and we find 
easily that the boundary planes are each passing through two neighboring 
points Py, Py+i and one of the endpoints Pi or Bn. If p » 2, and the first 
two moments arc given, the boundary of (B) consists of the polygon BiPj ■ • • 
PxPi) Then we find without difficulty the conditions to be satisfied by B in 
the form of linear inequalities between the given Si, Si, • • ■ Sr. 

We get the continuou.s case a.s a limit of the discontinuous case as ty ty+i 



AN INEQUALITY FOR MILL’S RATIO 


245 


and the points P(J) take up the whole curve (C), e g between i = 0 and «. 

Then the relations between the given Sp become non-hnear inequalities, well 

known for the problem of moments. 

REFERENCES 

[1] Ku Lai Chung, “On the piobability of the occurrence of at least m events among n 
nibitrary events,” Annals of Math. Slat., Vol 12 (1941j. 

[2J Hn.DA Geihingeii, "Sui lea vaiinblcs aldatoires nrbitrairement liees,” Revue Interbal- 
camque, 1938. 

13) Hilua Gbiiungeh, “Bemerkung zur Wahrschcinlichkeit nicht unabhangigci Ereignisse,” 
Revue Interbalcanique, 1939 

[4] R, V Miser, “The limits of a distiibution function if two expected values aic given," 
Annals of Math Slal , Vol 10, 1939 


AN INEQUALITY FOR MILL’S RATIO 
By Z. W. Birnbaum 
University of Washington 

hlr. R. D, Gordon' recently proved the inequalities 

X 


1 •Vi/^'Tr 


e dt < -■ ] _ for x > 0. 

a; VSt 


+ 1 ■v/27r V2ir • 

In the present note we show that the lower inequality can be replaced by the 
better estimate 


s/i + — X 1 


L_ < J_ f 
^ L 




2 -y/ 211 "s/ 2Tr 


dl, 


Proof: According to a well-known theorem of Jensen^ for f{t) convex and 
p(0 ^ 0 in the interval (a, b), the following inequality holds 


/ 


ig(t) fi'W < / /(O 9(0 \ 9(0 dt. 

For a = x,b = m ,f(i) = l/l, git) = this inequality gives 

f dl/[ Oe-*" dl < f e-'" dt / f le-*“ dt. 


Since 


j te dt = e~ 


and 


j l"e dt = xe 4- J dt, 


‘R D, Gordon, "Values of Mill’s ratio of area to bounding ordinate of the normal 
probability integral for large values of the argument,” Annals of Math, Slat , Vol, 12 (1941), 
pp 364-366, 

® See for example. G. H. Hardy, J E Littlewood and G. Pdlya, Jnejualtlies, Cambridge, 
1934,p 150-151. 



246 


Z. w. BIRNBAUM 


we find 


and hence 













ADDITIVE PARTITION FUNCTIONS AND A CLASS OF STATISTICAL 

HYPOTHESES 

By J. WoLFOWiTZ 

Nm York City 

1. Introduction. The purpose of the finst part of this paper is to prove 
several theorems about a class of functions of partitions which are additive in 
structure and subject to mild re.strictions These theorems may be regarded as 
contributions to the theory of numbers, but if one makes certain assignments 
of probabilities to the partitions the theorems may be expressed as statements 
about asymptotic distributions It is m this latter, probabilistic language, that 
we shall carry out the proofs, for the following reasons, The discussion will be 
more concise and certain circumlocutions will be avoided The theorems have 
statistical application and a number of theorems discussed recently in statistical 
literature are coiollaries of one of our theorems. 

In the second part of this paper the theory of testing statistical hypotheses 
where the form of the distribution functions is totally unknown and only con¬ 
tinuity IS assumed, will be discussed. The exact extension of the likelihood 
ratio criterion to this case will be given. Approximations to the application of 
this criterion in two problems will be proposed, one of which applies the results 
mentioned above. Lastly, in connection with the second problem, a combina¬ 
torial problem will be solved which is new and has interest per se. 

2. Partitions of a single integer. Let n be a positive integer and A = 

(oi, 02 , • • • , a,) be any sequence of positive integers ov (i = 1, 2, ■ , s), 

a 

wheie ^a, = n, and s may be any integei from 1 to n. Two sequences A 

t-i 

which have different elements or the same elements arranged in different order 
are to be considered distinct, so it is easy to see that there are 2" ^ sequences A. 
We .shall consider the sequence A as a stochastic variable and assign to all 
sequences A the same probability, which is therefore Let r, be the 

number of elements a in A which equal j (j = \, 2, ■ • , n), so that ry is a 
stochastic variable. Let k be an integer < n. Then the joint distribution of 
the stochastic variables ri , ri, • • • , r* is given as follows: The probability that 
n ~ b( (i = 1, 2, ■ ■ • , fc) is 


where the inner summation is carried out over all sets of non-negative integers 
r(k+i) )•••,»'« snch that 

(2.2) hi + h2 + ■ • • + h* + r (li+i) + ■••+»■„ = r, 

( 2 . 3 ) hi + 262 -f- • • • + feh* + (A: -f l)r(j:+i) + ■ • • + nr„ = n 

247 


(2.1) 


-.+1 ( Y 




r! 


(6i)'(r(fc4.i)) 


••• (rn)l) 



248 


j. wrii,KtnvM’ 


(Tl«i‘ //, , ttf rtniM', ari‘ i»in"r»*gali\ t* iutt'gfr.*-.! 

I.<‘t r Hr,, aiifl 

^ H '’■t (^' < n), 

I 

w that r and ntt nrv taitii “torhaatic* variatthw- Thn jirrdtaliility tlinf at, the 
K5imt’ time 

(2.4) r, h, , li 1, ,k), 

and 

(2.5) ra ^.u -- h,i < i; . 


is given by (2,1) with the re,«trieticin 

(2.()) ra,ti + * • • d* r„ ~ > 

added to (he realrictions (2.2) nntl (2.3). With thi.s added rtwtrietion the 

* 

summation in (2.1) may he performed as follows: I^et / = H • It *■''* 

to see that the number of sequencefi A where every a, > A', r = ru ui = I»a+i) » 
and Sa, ^ n ~ i, i.s given by the coefficient of x" ‘ in the rairely formal e.xpan- 
sion in x of 


(/H ^ 4 . + 






^{h ^ I, 


/ 1 V'*»i) 

Vl - xj 


and is 


/n — t — Afe(*+i) — 1 

\ h(t+i) — 1 

Hence PI(2.4) and (2.5)), where this symbol will always denote the probability 
of the relation in braces, is seen to be 


(2.7) 


I'l + 


(bc*+n)i IT 

^-.l 


— ( — Ah(t+n — 1\ 

Ihic^i) “"1 / 


If X is a stochastic variable, let E{X) and cr’(Z) denote, respectively, the 
mean and variance of X (if they exist), and if Y is another stochastic variable, 


let (r(XF) be the covariance between X and Y. Also let X 


X - EiX) 
‘TiX) 


Bs' the distribution of X we shall mean a function (p(a:) such that PfA'' < ;c) = 
ip(x). These conventions being established, we seek first to evaluate 
This may be done by differentiating with respect to y thexioefficient of a" in the 



ADDITIVE PARTITION FUNCTIONS 


249 


purely formal expansion in a: of 2 ""''(a: + ■ + x'~^ + yx^ + 

setting y = 1 and summing over all values of r. We have therefore to evaluate 



which is easily seen to give us the result 

(2'8) JS(r,) = (n - t + 3)2~'~\ (i < n), 

while it is obvious that 


(2.9) E(r„) = 2~"+'. 

By use of similar devices the variances and covariances of the r, may also be 
obtained. We omit the details of those calculations and also the presentation 
of the covariances, since the latter are not necessary for the proof of Theorem 2. 
The results are: 


(2.10) Ar.) = n Q,V. + 1^) + 


1 31“ - 12t + 5 


')■ 


(i < |r). 


The limitation on the value of i is necessary because the processes for summing 
binomial coefficients with the aid of the device described above are no longer 
applicable. The matter is easily settled, however, for if Z is a stochastic 
variable which can take only the values 0 or 1, then 

^\X) = E(X) ~ [B(X)]\ 


The ri for i > jn are such variables, so that 


(2.11) 

<r'(r,0 = 

(2.12) 

An) = 


n — i + 3 (n — t + 3)' 

2‘+i 2^’+^ 

(2"-' - 1) 

2®"-s 


(n > i> in), 


Also without difficulty we have 


(2.13) 


Ann) 


n + 6 . (n -f 6)- , 1 

2*Cn-H) 2"+^ 2"“*’ 


when n is even and > 2, and 

(2.14) E{r) = iin + 1), 

(2.16) Ar) = i(n - 1). 

Finally, 

(2.16) E{r[k+i)) = (n — fc + 1)2 * 

The next results we shall need may be expressed in the following: 

Theorem 1: As n approaches infinity, the joint distribution of the stochastic 



250 


J. WOLFOWITZ 


variables fi, • - • ,fk, fjt+i) {k any fixed positive integer), approaches the multi¬ 
variate normal distribution. 

This theorem is proved as follows; Make the substitutions 


Xi 


r, — n-2' 

y/n 


X(l!+t) = 


r(*+i) ' 

•\/ n 


(i = 1, 2, . ■ • , k), 


in the expression 


^> + ’"(A+i)')! (n ~ i — kr[ic+i) — l' 

5-n+l \i-l / ( 


(7'a+i))! n 

t-l 


r(*+i) - 1 


which comes from (2.7), and regard t as equal to X) i>'< ■ Replace the ^'arioiis 

I-*! 

factorials by their asymptotic approximations as given by Stirling's formula and 
simplify the resulting expression. The subsequent prociidure i.s simple but 
laborious and wc omit the details, which are like those of the classical proof of 
De Moivrc’.s theorem as given, for example, in Freclict [1], p. 89. 

We now prove the following theorem on additive partition functions: 
Theorem 2; Let f{x) be a function defined for all positive integral values of x 
which fulfills the following conditions: 

(a). There exists a pair of positive integers, a and b, .such that 


(2.17) 

(b). the series 


/(o) ^ a 
m b> 


(2.18) 


i:i/(t)|2-»‘, 


converges. Let F{A), a function of the stochastic sequence A, be defined as follows: 
(2.19) F(A) = i:/(a.). 

I**! • 


Then for any real y the probability of the inequality PiA) < y, approaches 


-4= r 

'\/2ir J-K 




dy, 


as 71 00 . 

We restate this theorem without use of probabilistic terras: 

Let A be any sequence of positive integers whose sum is a given integer n. 
Consider two sequences A to be different if they contain different elements or 



ADDlTnE PARTITION FUNCTIONS 


251 


the same elements arranged in a different order Let f{x) and F{A) be defined 
as above, with the aforementioned restrictions. Then there exist, for every 
positive integer n, two numbers and , such that multiplied by the 
number of sequences A for which the inequality 


holds, approaches 


F(A) - En < 2/ffn , 


1 

\/ 27r 



dy, 


as n —> CO. 

For convenience, the proof will be divided into a number of lemmas. 

If ip(y) is any continuous distribution function, then it is well known that ip{y) 
is uniformly continuous and that consequently, for any arbitrarily small, posi¬ 
tive e, there exist two positive numbers, h and D, with the following properties 

(a) . If j/i and 1/2 are an.y real numbers such that \yi — Vi \ < h, then 

I v{yi) — <p{yi) 1 < «, 

(b) If y is such that 1 2 /1 > D, then v'C! 2/1) > 1 ~ ^^^d (p( — 1 2 /1) < « 

We now first prove 

Lemma I Let X and Y be two stochastic variables, both of which possess finite 
means and variances Suppose that there exists a continuous distribution function 
<p{y) and two small positive numbers e and S (say t < 1/10, 5 < 1/10), such that 


( 2 . 20 ) 

for all y, and 
( 2 . 21 ) 


[FIX <y} - ^(y)| < e, 

-r(F) _ 

a(X) • 


Let h and D be chosen as above for <p{y), with the additional proviso that h < h 
and D > 1 Suppose further that 

(2.22) a<mm(i,|). 

Then 

(2 23) I P{{XTY) <y] - viv) I < 3e, 

for all y. 


Proof. We have 

<r'(Z -f Y) = ,f\X) -h 2,t(XY) + AY). 
Since, as is well known. 


1 ir{XY) \ < a{X)c(J), 



262 


J. WOI.KOW1TZ 


it follows from (2.21) that 

(2,2-i) <7f.Y + Y) = (1 + o')^(X), 

whcip j 5'1 < 5 Hon CO 


(2.26) 


From Tchohyclioffs iiioquality and (2.21) it then follow.s that, if d — /i/4, 
(2.26) P 


IHX + r) 1 j d>' 


and 

(2.27) 

Now 


45* 


< t* < e. 




i 

> d 


(2.28) 


fA'-/f(.Y) ir-Axy) 

\c{X + 1’) ^ 1 <riX + r) 

<P|(Y+ r) <yi +< 


< p 

U(Y + y) 

Hence, from (2,24) 

P{Y <{y- d)(l + 5')1 - e 

< H((rTy) < y.\ < P{X <{V + d){l + 5')1 + « 
and consequently, from (2.20) 

<p{y - d + yS' — dS') ~ 2i 

< P{(T+~Y) < y] < ^( 2 / 4- d + y&’ + (lY) + 2«. 
Now if 1 1 /1 < 2D, then from (2.22) 


(2.29) 


(2,30) 


d + 12/y 1 + d I 5' 1 < ^ + ^ + ^ = li, 
4 2 4 



ADDITIVE PARTITIOK PLXCTIOXS 


253 


and if 1 2 /1 > 2D, then also from (2 22) 

12/ I - d - I 2/5' I - d I 5' I > I y I (1 - 5) - ^ > I 12/ I > I D. 

Recalling the definitions of h and D, it follows from (2.30) that, for all y, 
(231) <p{y) - Se < P{{TTY) < y} < <p{y) + 3e. 

This proves Lemma 1 

Lemma 2; For any fixed pair a, b, of positive integers such that a < b, 

(2 32) 






Proof From (2.8), for fixed i 


- E(r,) 
n 


and from (2.14) — E(r) —> ^ as n —»■ ». The required result follows easily. 
n 

For any n we now define 

B{k,n)= [/(,)], 

and 

C{k:n)= ^ r,[f{i)]. 

<-K+l 

Then 

F(.4) = B{k-,n) + C{k,n). 

Lemma 3: For any real y and any fixed positive integral k the probability that 
the stochastic variable B{k-,n) shall fulfill the inequality B{jk,n) < y approaches 

Proof. By Theorem 1, the stocha-stic variables n, h ,■■■, fa, f(i,+i)' are 
asymptotically jointly normally distributed. As an immediate consequence so 
are the variables fi, fj, • • • ,h, and hence B{k',n), which is a linear function 
with constant coefficients/(l),/(2), • ■ /(fc), of n , r 2, ■ • ■ , r-*,., is asymptotically 
normally distributed 

Lemma 4. There exists a constant c > 0, such that, for all n sufficiently large, 
(2.33) <r\F{A)) > cn. 

Proof. For any sufficiently large, arbitrary, but fixed n, we will construct 
two sets. Si and Si, of sequences A, with the following properties (Si and Si 
have the same probability p, with p always greater than (3, a fixed positive 



254 


J. WOLFOWITZ 


constant which does not depend on n. Since the probabilities of Si. and Sa are 
equal, each possesses the same number of seciuences /I. Between the member 
sequences of the sets Si and & we will establish a one-to-one correspondence 
such that, if Ai is a member of Si and Ai i.s its corresponding sequence in S 2 , 
then 

(2.34) I F(Ai) - F(A 2 ) j > 2dVn, 

where d is a lixed positive constant which does not depend on n. 

It is easy to see that such a construction would prove the lemma. The 
probability of any sequence A is 2“"'*''. Hence llie contribution of a corre¬ 
sponding pair Ai and Ai to the variance of F(A) is by (2.34) not less than 
and the contribution of the sets ib'i and S 2 i.s not less than 2fid^n. 

It remains then to carry out the con.struction of Si and Si. For the sake of 
simplicity in notation, we shall cany out the con.struction with the a.ssumption 
that the integers a and b of (2,17) are 1 and 2. It will be readily apparent, 
however, that the proof is perfectly general and with trivial changes holds for 
any pair a, b. This lemma is the only place where the hypothesis (2.17) is used. 
The latter condition is necessary becau.se, if for every pair of positive integers 
i and J, 

/(O _ I 

/O) j ' 

then F{A) is a constant multiple of n, for a = S then 

F(A) == S/(a.) = =/(l) 2 iV. = n/(l). 

l t I 

Each sequence A uniquely determines the "coordinate” complex 

(r,, ri, • •• ,r„) 

which we prefer to write as the pair L = (I, V): 

I = \n,ri\, 

V = {rj, , r„l. 

To each pair {I, V) there corre-spond in general many .sequence.s A whose exact 
number may be explicitly given in terms of factorials. The totality of nil .^1 
whose L have the same second member V will be called the group determiiu'd 
by Z', or just the group I'. The subset of a group V all of who.He A have the 
•same vi will be called the family {V, r,). All the A in the same family have the 
same L. For I' and n determine n through the equation 22 ii\ ~ 

t 

According to Theorem 1 for k = 2, ri, ra, r'l are a.symptotically jointly 
normally distributed. Let 



ADDITIVE PARTITION FUNCTIONS 


255 


The limiting variances of 7-2 and r\ are constant multiples of run • Therefore 
the set K of all A whose L satisfy the constraints 


(2.35) 


T<n<7+V7liri 

4 4 

g<» 2 <g+V^O'l 


g < J-a < g + -Vnci 


has, by virtue of the fact that the limiting correlation coefficients of the variables 
J"!, i' 2 , ra are all less than 1 in absolute value, a positive probability, which 
exceeds a fixed positive constant 7 for sufficiently large n If any member 
sequence A of a family is in H, the entire family is obviously in H. Any se¬ 
quence A belongs to one and only one family. Hence the set H may be decom¬ 


posed in a disjunct way into entire families, 
in H, where of course 0 < hi < s/n n . 


Let ( Z', T + h^ be any family 
Consider the (second) family 


('-■i 


^Z', I + 2\/n (Ti + h 1 ^. This family is not in H. We now wish to show that 

the probability of the second family exceeds c' times the probability of the first 
family, where c' is a fixed positive constant which does not depend on either n 
or the particular families in question 
For the first family, let 


ri 


= l + hi, 


' ” I j, 

rt = ^ + hi, 


Ti = --\- hi, r hi hi + hi. 

Hence 

(2 36) 0 < A, < Vnn {i = 1, 2, 3). 

For the second family we therefore have, since both families are in the same 
group. 


ri = ^ -H 2\/n ffi -f- Zii, 
r2 = I — s/nffi -b hi, 

Ti = g + ^ 3 » 

r = - -f- n<Ti + hi + hi + hi. 



256 


J. WOLFOWITZ 


The ratio of the probability of the second family to that of the first family 
equals the ratio of the number of sequences A in the second family to the 
number of sequences A in the first family. By elementary combinatorics, since 
both families are m the same group, the latter latio i.s 


(2.37) 


'\/n<Ti + hi A- hi Ar h ^! + h^ I -f- h^ ! 


+ 2 -\/ncri + /ii^! — -y/ncri + + ^1 + ^)3 + /h^! 


and hence exceeds 


(2.38) 


(E 


+ Ai -h ^3 + h] 


'^v'n <ri 

(r + + i.)-"'-- (1 + A.)''"'. 


At this point, if we had been using the numher.s a and b of (2.17), we would 
make use of Lemma 2. In the present ca.se the re.sult of that lemma is trivial. 
It is easy to see, therefore, that (2,38) wiuals 


1 + 


2hi -f" 21is A" 21ij 


(2.39) 


i^V «•'1 


X (^1 + 

which, in view of (2.36), exceeds 


8\/n/ri A~ AhiJ~^'Ai n ^ 8\/ncri — 


(2.40) 


1 .a — 

Vn J \ A/nJ 


which, in turn, for sufficiently large n, exceeds 
(2 41) = c'. 

We are now ready to construct Si and & . Let 


/i = il\ r.) 

be any family in H and consider the family 

fi = (V, u + 2-\/n o-i), 

Select in any manner whatsoever c'v of the .scquoncc.s .4 in /i, where v is the 
total number of sequences in /i . Call thi.s set of .sequences f*. Select in any 
manner whatsoever c'v sequences from fi and call thi.s set /**. That there exist 
at least c'v sequences m/j is assured by equation (2.41), In any manner what¬ 
soever establish a one-to-one correspondence between the .sequences of f* and 
/**. Suppose Ai and As are corresponding sequences. Since/* and/** belong 
to the .same group, and .since/(2) 2/(1), we have 



ADDITIVE PARTITION FUNCTIONS 


257 


(2 42) ' " 1/(2) Vn <71 - 2/(1) Vn vi | 

= 1/(2) - 2/(1) IVI, 

so that (2.34) holds with 

(2.43) d = i|/(2) -2/(l)|vi. 

Now proceed in this manner for all the families /i in H. The union of all the 
sets /* is the set iSi and the union of all the sets f** is the set (Sj It is clear 
that, since the probability of H exceeds -y, the probability p of S\ exceeds 
d = c'y. This proves Lemma 4 

Lemma 5. For any arbtlrarily small positive number f there exists a positive 
integer p(f), such that for any k > ^l(^) and all n greater than a fixed lower bound, 

(2.44) AC{k-,n)] < fn. 

Proof Since 


C(k-,n) = S n/(^), 

»-jfe+i 

and, as is well known, 


I v(XF) I < v(X)v(F) 

we have 

(2,45) 4C(/c;n)]<r ± |/(0k(n)T 

L >■-*+! J 

From (2.10) it follows readily that 

(2-46) Ard < I + , 

and the quantity in parentheses in the right member of (2 46) is easily seen to 
be negative, so that, for i < \n and n > 3, 

(2.47) v(r0 < 2-^‘ 

From (2 11) and the definition of r,, it follows easily that (2.47) holds also 
when i > \n and n > 3 

Hence, in view of (2.12), (2.13), and the convergence ot the series in (2.18), 
the desired result follows from (2.45). 

Lemma 6. Let the f of Lemma 5 be < Jc, where c is as in Lemma 4. Then 
for k > p({) and n larger than a fixed lower bound 

(2.48) cr^iBik^n)) > ^cn 


Proof: Since 


F(A) = J5(fc,n) + C(fo;n), 



258 


J. WOLFOWITZ 


we have 

AF(A)) = AB{k]n)) + <T^(C(t,n)) + 2a{BC) 

< c\B) + AC) + 2c{B),r{C) = (<7(S) + a{C))\ 

Hence from (2.33) and (2 44) Vea < aiB) + iy/cn and the required result 
follows, 

Pboop of the Theorem; Let « be an arbitrarily .small positive number. For 
all n sufficiently large wo have, by Lemma 3, 

lPlS(k;n) < i/j - [ e~^‘'''dy [ < e, 

I V2-r i 


for all y. For a small ^ to be chosen later and large enough k and n we have, 
by Lemmas 5 and G, 


(2.49) 


c{Cik-,n)) jl 

'alB(k,n)) c 


Now let the (fl{y) of Lemma 1 be defined as 






and choo.se h and D as in Lemma 1 for our present «. Since c is fixed and { 
still at our disposal, choose { sufficiently small .so that the d of (2.49) satisfies 
(2.22) Since the hypothesis of Lemma 1 is satisfied, we have, from (2.23) and 
Lemma 3, for all n sufficiently large, 

|P(/^C,4) <y\ -^iy)\ <3e 
for all y This i,s the requited result. 


3. Partitions of two integers. Let tii and be po-iitive integer.^, ni + ria = n. 

til til 

— = Cl, — = e 2 , and e = max (ei, cj). I,et V = (vi , t '2 , • • • , u.) be any seciuence 

71 71 

of positive integers r, (z = 1,2, • ■ • , s) where ai + aa + as + • • ■ ciiuals either one 
of rti and nj, while a 2 + + as + ■ • • equals the other. Such .sequence.s arc of 

statistical importance (cf. Wald and Wolfowitz [2]). As before, .sec]Uence.s V 
with different elements or with the .same clenient.s in different order will he con¬ 
sidered different and to each sequence V will be as,signed the same probability, 

which is therefore easily seen to be 

n! 

Let ri, be the number of elements equal to i in that one of the two sequences 
(ai, 03 , 06 , • ■) and ( 02 , 04 , Oj, • ■ •) the sum of who.se elements i.s 711 and let 
r 2 , be the corresponding number for the other sequence. Let 



ADDITIVE PARTITION FUNCTIONS 


269 


Si = Tu + Ti,, 

n = £ n., ra = ^ n,, 


It 1 

S = J"! + ^2 , *'l(/l:+l) = ^ I'll 

t-i+l 

l2(*;+l) = S l2t . 

l-i+1 

The necessary computations such as are given in the beginning of the previous 
section have been performed by Mood [3] and we summarize them as follows: 

Theorem 3 (Mood) ■ As n approaches mfimty while ei and ei remain constant, 
the joint distribution of the stochastic variables 

hi , fu , • • • , fu , il(A:+l) ,121,122, ' ' * fik 

(where k is any fixed positive integer), approaches the multivariate normal distribu¬ 
tion. 

Mood (loc. cit) gives the following parameters, with the convention that 

(3.1) = xix — l)(x — 2) ■ ■ • (x — i + L): 

( _L 

(3.2) 


(3.3) 


-— > 

^(iii) 


hm 

n -^00 71 


Sl 62 , 


(3.4) hm = 

?i“*oo 71 


ei'^^ 62 , 


(3.5) 


(3 6) 


“(III) = 


nf’(?i2 + 1)'^' nf'’ , (ni + (n^ + l)'’'’ni 


n (2i+2) 




f, (n, + l)'^>ni‘‘>\ 

\ 2l(.+l) ) > 


hm 


<r“(li.) 


_ 2i-l 3 
— 61 62 


[(f “i" 1)^6162 — 1^62 — 2ei] + Gi 62. 


The corresponding parameters for r 2 , may be obtained from the above by inter¬ 
change of ni and . Also 

E(ri) , E(ri) 
hm —^— = hm —^— = 6162. 


(3.7) 


n 


-♦CO 71 


For additive partition functions we have the following theorem: 

Theorem 4 Let f(x) be a function defined for all positive integral values of x 
which fulfills the following conditions: 

a) There exists a pair of positive integers, a and b, such that 

f(o) , a 

m b ’ 


(3.8) 



260 


J. WOLFOWITZ 


b) the series 

(3.9) 

converges. Let F{V), a function of the stochastic sequence V, be defined as follows: 

(3.10) F(V) - tf(vj 

*-l 


Then for any real y the 'probability of the mequality < y approaches 



dy, 


as n 00 , while Ci and d remain constant. 

The basic idea of the proof of this theorem is the same as that of the proof 
of Theorem 2. We omit all the steps which can be written without difficulty 
by analogy to those in Theorem 2 and present only those where some major 
change is necessary, The numbering of the lemmas will corre.spond to that of 
Theorem 2. 

Lemma 2. For any fixed pair, a and h, of positive integers such that a < b, 

(3.11) [£(ri)r“-t£(ri)r^[S(riOr-[B(r«)r-[S(n.)]-'.[£?(r,.)r* ^ L 
as n —+ oo. 

The proof is the same as before. 

The following are the definitions corresponding to those of Theorem 2: 

B(k;n) = £ Sif(i), 

»-i 

C(k\n) = 2 Sif(i). 

<-*+! 

Then as before 

F{V) = B{k-,n) + C(k-,n). 

Lemma''4. Statement is the same as that for Theorem 2. The following im¬ 
portant changes must be made in the proof: 

Each sequence V determines the coordinate complex 

ni , ra, • ■ • , n„ 

rn , rn» • • - , r»n, 

also 

fni > rit 
t = , 

rji, Tn 



ADDITIVE PAKTITION FUNCTIONS 


261 


i»-23 , ■■■ ,T^n] 

The set is the set of all V whose L satisfy the constraints 

Tieie2 < Til < neica + Vn o-n , 

^6162 < ri 2 < neie\ + -\/n a-n , 
nelei < < nelci + irii, 

neiel < rn ne\e\ + ■\/n <rn , 
nele2 < < neXe^ + s/n <^11, 

where 

I- '^(ni) 

(Til = lim —— 

n-^oj 'Y' U 

The repi esentative family for H is characterized by 

(V, neiel + hn), 
and this family is compared with the family 

{!', ticifil + 2\/n vii + /ill). 

For the members of the family m H 

I'll ~ 116162 flu — fimn "h /ill) 

ri2 = neie* + ha = nmn + /112 , 

1*21 — 716162 “I" hil — ‘fllTlil “h /!21 j 

r 22 = 716162 "h /I 22 — 7 J. 7 II 22 “t" /l 22 , 

7 i 3 = 716162 “h hi 3 = nmii -|- /113 , 

ri = 716162 + h' = nm + h, 


where 


rz — ri 1 < 1, 


(3 12) /i,j < Vn 0-11, 

(3.13) h = hn hi2 ha 

And for the members of the second family 

7’ii = TiTTiii -(- 2-\/n an 4" /in , 

?'i2 = 7177112 — "v/n (I’ll 4" hn , 


h = hn 4 " hi2 4 " hi 



262 


J. WOLFOWITZ 


Til = nm;i + 2 \/n ffn + /121 + 621 , 
rjs = nwijs — "v/n <rii + h‘n + On , 
ru = 7mU + /ij3 , 
n = nm + -x/n <ru + /t, 

I ra - ri I < 1, 


with 


[ <921 I < 1, Um I < 1 

To the expression (2.37) corre.'iponds the cxpre.ssion (3,14), with 1 1 < 1; 


(wmi i + /ill)' (n?«i 2 (nriui + /iji)_[(n7?!2,_+ Am)! 

(nw + A)! (nm + A)! 


(3.14) 


X 

X --- 


(nm + A + -v /n tru) ' 

(nnhi "t" 2\/n ffn + Au)! (nnin — ■%//i cn + Alj)! 

(nni + A s/n o-u "h ff)! , 

(wiiji + 2-v/?i au + Aai + 0ai)l (mnn — '\/n irn “h Am -f- (?m)1 

which exceeds 

(nm + ''' X (nmu + 2v'n <rii+A,i)~“^"''' 

X (nmu — v^n au + Au)'^" 

X (nmji -f- 2\/n an + Aji) 

X (nmu - Vnau + Am)^"'“ 

Employing Lemma 2, we find that (3.15) equal.'; 

(1 + ±Y'''' X (1 + - 

\ nm/ \ nmu / 

— Vn! an + Ala'' 


(3.15) 


(3.16) 


( 


X 1 + 


nmu 


" '1 1 


X + AaA 

\ nmai / 

X fl 4- ~v"9ian + A m\ 
\ nmaj / 

In view (3.12) and (3.13), (3.10) exceeds 

(1+X (1 - 

\ nmii / \ nmia / 


av'n »i 1 

v'n 


C3.17) 


S'v/n an'' 


X X fi - 

\ nmai / \ nmx / ’ 



ADDITIVE PARTITION FUNCTIONS 


263 


which, for sufficiently large n, in turn exceeds 


(3.18) 



'A + A + A + A) 

OTi2 mu m-a./ 


Lemma 5 Statement is the same as for Theorem 2 The proof then pro¬ 
ceeds as follows' 

We have 


(3 19) 




,n)) < l/(j) I <i(?'.,)') ■ 

\,=i ,=jt+i / 


From an examination of (3 5) and (3.6) we may see without any difficultj^ that 
the second of the three teuns of the right member of (3.5) (after removal of 
parentheses) is asymptotically equal to n times the last term of the right member 
of (3 6) and hence that the other two terms of the right member of (3 5) are 
asymptotically equal to n times the right member of (3 6) without its last term. 
Now when 


hti VI. < 1 


which will always occur when i is equal to or greater than a sufficiently large 
fixed integer ju, that part of the right member of (3.6) which is in square brackets 
is easily seen to be negative. Hence from the definition of asymptotic equiva¬ 
lence it follows that, for all n sufficiently large. 


(3 20) 


72f(^_l)^V’ (n 2 + l)“^(n2 + 

^(2^4-2) ^(^+1) 


and 

(3.21) < 2ne'‘+' < 2ne^ 


Hence, for all n sufficiently large, 

(3 22) <r^(nii) < Sne'* 

Now consider the expression (3.5) for i = p. and i = g + 1 Passage from m to 
^ -f 1 multiplies the first term of the right member of (3 5) by 

oo^ (”i “ 2M)(ni — 2 m - 1) 

^ (n - 2m - 2)(« - 2m - 3) ’ 


and the third term of the right member by 


(3.24.) 


(ni - pf 
{n — p — 1)^ 


It is easy to see that for large but fixed p and all n greater than a lower bound 
which is a function of p only, the expression (3 23) is less than the expression 
(3.24). Hence, in view of (3.20), the sum of the first and third terms of the 



264 


J. WOLFOWITZ 


right member of (3,5) for i = m + 1 is negafive. Xoav consider what happens 
to the second term of the right member of (3.5) As-hen i goes from m to m + 1. 
It IS multiplied by 


(3.25) 


(ni — m) 

(n - IX ~ I)' 


Avhich, also for large hut fixed n and all n larger than a loAver hound Avhich i.s a 
function of /at only, is easily seen to he les.s than c. (’on.seciuentlj' 

(3.26) < 2nc''“. 


It can be .seen without difficulty that such a passage of (3.5) to the next higher 
index Is always aoxompanied by multiplieation by exprc,s.sion.s similar to (3.23), 
(3 24), and (3.25), for Avhich similar incciualities hold and that eonseiiuently 

(3.27) 0 < < 2nr‘, 


and for similar reaaon.s 

0 < < 2ne\ 


for all i not less than g and for all n greater than a lower bound Ai liieli Is a func¬ 
tion of IX only (although it may be neee.s.sary to inerense the original ii so that 
both the la.st two equation.s hold). The. re<iuired ri'.sult follou.'^ fi'om (3.19) 
and the convergence, of the series (3.9). 

The proof of Theorem 4 follows along the .same line.s as that of Theorem 2. 

When/(a) s l, F{V) ^ C/(T), the statistic discusseil in [2]. Other such 
results follow from specialization of /(a) Theorem -I may also he generalized 
so that the elements e; which add up to ni are operated on by a function fi , 
while the elemcnt.s a, which add up to th aie operated on by anotlu'r function 
fi , but thus is easy to .see and we do not go into the details. 


4. Tests of hypotheses in the non-parametric case. The great advances 
that have been made in mathematical statistics in recent year.s have been in 
two directions. On the one hand, the foundatioius of atatistic.s, the theory of 
estimation and of testing hypotheses have been put on a rigorou.s basis of 
probability theory, and on the other, powerful methods for obtaining critical 
regions and confidence intervals and criteria for appraising their efficacy have 
been developed. Most of the.se developments have this feature in common, 
that the distribution functions of the various stoehastic variables which enter 
into their problems arc assumed to be of knoAvn functional form, and the theories 
of estimation and of testing hypotheses arc theories of estimation of and of 
testing hypotheses about, one or more paramcter.s, finite in number, the knowl¬ 
edge of which would completely determine the various distribution functions 
involved. We shall refer to this .situation for brevity as the parametric case, 
and denote the opposite situation, where the functional forms of the distribution.s 
are unknown, as the non-parametric case. 



ADDITIVE PARTITION FUNCTIONS 


265 


The literature of theoretical statistics, therefore, deals principally with the 
parametric case The reasons for this are perhaps partly historic, and partly 
the fact that interesting results could more readily be expected to follow from 
the assumption of normality. Another reason is that, while the parametric 
case was for long developed on an intuitive basis, progress in the non-parametric 
case requires the use of modern notions. However, the needs of theoretical 
completeness and of practical research require the development of the theory 
of the non-parametric case. The purpose of the following section is to con¬ 
tribute to this theory. 

Brief mention of some of the literature may be made here The problem of 
parametric estimation by confidence intervals, was put on a rigorous foundation 
by Neyman [4] and extended to the estimation of distribution functions in the 
non-parametric case by means of confidence belts by Wald and Wolfowitz 
[5]. Problems of testing non-parametric hypotheses have been treated in 
various places. The rank correlation coefficient has been used for a long time 
to teat the independence of two variates. Its distribution was shown to be 
asymptotically normal by Hotelling and Pabst [6] and its small sample distribu¬ 
tion was discussed by Olds [7]. The problem of two samples has been dis¬ 
cussed, among others, by Thompson [8], Dixon [9] and Wald and Wolfowitz 
[2]. In 1937, Friedman [10] posed the non-parametric analogue of the problem 
in the analysis of variance and proposed a very ingenious solution. 

All these proposed solutions have this m common, that there exists no general 
principle which can be applied in each particular case to obtain a critical region, 
a role which is performed in the parametric case by Fisher’s principle of maxi¬ 
mum likelihood and the likelihood ratio criterion (Neyman and Pearson, [11]), 
whose validity, at least for large samples, has been established by Wald ([12], 
[13]). In each problem the solutions proposed have been intuitive and usually 
based on an analogy to the corresponding problem in the parametric case. Thus 
the principal justification for the use of the rank correlation coefficient is that 
its distribution is independent of the unknown distribution function (under 
the null hypothesis) and that its structure resembles that of the ordinary cor¬ 
relation coefficient. But any function of the order relations among the variates 
(cf. [2], p. 148) has a distribution which is independent of the unknown popula¬ 
tion distribution under the null hypothesis The same objection may be made 
to papers [8], [9], [10], [2], except that in [2], although the solution there proposed 
is an intuitive one, the criterion of consistency is extended from the parametric 
case to the non-parametric one. The fulfilment of this condition is a minimal 
requirement of a good test and on this basis the solution proposed in one of the 
previous papers cannot be considered a good one. 

In the following section we shall show that the likelihood ratio criterion may 
be extended to the non-parametric case where the test must be made on the 
order relations among the observations and that for a certain class of these 
problems which fulfill the same requirement as that for the application of the 
likelihood ratio criterion in the parametric case it would thus appear to furnish 



260 


J. WOLFOWITZ 


a general method by ’whielv statisticK may he obtained for a speeifie irvoblem. 
We shall show this by ai)]ilying it to the pioblein of two samples This will 
serve to explain the method. Another problem will be dusciissed later, The 
ultimate ju.stification of any .statistie mu.st be it.s poner function, which ought 
therefoic to eon-stitute the next .subject of iiive.stigatinn for the.se jiroliloms. 
Since for problems in the non-parametric case it is almost certain that uniformly 
most powerful tests do not exi.st, the tjuo.stion of determining the alternatives 
with respect to which proposed tests are powerful is particularly im))ortant. 

6. The problem of two samples. Let X and T be two stochastic variables 
with the distribution functions/(x) and g{x), respectively. (The term distribu¬ 
tion function will always denote the cumulative distribution function. The 
letter P followed by an exprc.ssion in braces will .stand for the probability of the 
relation m braces. Hence P|X < a;j = /(a:) for all x.) fix) and g(x) are 
assumed continuous. The Ui observations Ji, a;; • • , x,,, and n-j observations 
Vit Vii"' lUnt are made, on X and Y respectively. The (null) hypothe.sis 
to be tested is that/(r) s g{x). The admissible alternatives arc all contimioiis 
distribution functions/(x) and g{x) .such that /(j) g(x), The Hi -)- uj == n 

observations are. arranged in ascending order of size, thus: Z ~ , • • ■ , Zn 

where Zi < zj < • • • < z„ (the probability that z, = z.^i i.s 0). i.et V = Vi , 
t'a, • • ■ , Sb be a .sequence defined as follows; c, = 0 if z, is a mmnber of tlie 
set Xi, Xi, • • ■ , x„i and e, = 1 if z, is a member of the set j/i, j/a. ■ • • 1 2/n, . 
Then any statistic used to test the null hypothesis must he a function onlv of V 
([2], p. 148). 

We now apply the method of Neyman and Pearson [11) as follow.^, ll i.s the 
totality of all couples (di(x), dj(x)) of continuous cli.stribution fuuetioiiK. The 
set 0 ), a subset of 0, i.s the totality of all couples of distribution function.s for which 
di ^ di. The .sample .space is the totality of all sequenees F. The null hy¬ 
pothesis states that (/, g) is a member of w. The admissible alternative.s are 
that (/, g) is a member of U not in «. The distribution of any function of V 
is the same for all members of u. Hence this essential requirement on the 
statistic to be selected for the application of the likelihood ratio criterion (ef. 
[11]) IS .satisfied by any statistic which is a function of V alone. Furthermore, 
all sequences V have the same probability if the null hypothesis Is true ([2], 
p. 149) The numerator of the likelihood ratio is therefore a function only of 7ii 
and 772, is the same for all F, and is therefoie of no further inlere.st. Hence 
T'iV), a function of F whicli is a inonotonic finiy,tic)n of the likelihood ratio 
for this problem, may be defined as the denominator of the likelihood ratio, 
as follows: Let P[F; {di , ^ 2 )) be the probability of V when / s di , and p ds. 
Then 

T'iV) = max P{F; (di, d,)\. 

n 

The critical values of T'{V) are the large values. However, we may use instead 
of T'{V) a convenient monotonic function of r'(F). 



ADDITIVE PARTITION FUNCTIONS 


267 


As an approximation to r'CF) we propose T{V), a statistic which is obtained 
on the assumption that for a given V a couple {d* , d*) which is essentially the 
same as that of the two sample distribution functions corresponding to the 
particular V approximates a couple which maximizes the right member of (5,1). 
(We say “a” couple because it cannot be unique.) This assumption seems a 
reasonable one, particularly for large samples. Only the form of (d*, dt) is 
assumed and the missing paiameters are obtained in accordance with (5,1). 
Befoie describing the matter precisely, it must be stressed that this is offered 
only as a plausible approximation. Fbr certain extreme V, for example, like 
those where zeros and ones nearly alternate, this is definitely not the maximizing 
couple In spite of this the statistic T{V) assigns to these V values which are 
furthest removed from the critical legion for any level of significance, as indeed 
any good statistic should 

We first define a "run” as in [2], p 149 A subsequence , v^t+ 2 ) , ■ • , 
V(,+r) of V (where r may also be 1) is called a “run” if y((+i) = V{i+ 2 ) = • ■ = v^i+,) 
and if Vt 5^ when t > 0 and if V(i+,) 9^ V(e+r+i) when t r < n Let h, 
be the number of elements m the j**' run of elements 0, and h, the number of 
elements in the run of elements 1 Suppose for a moment that the first 
element m F is a 0. Consider the following situation. There is an interval 
[tti, aj], ai < Uj, on the line — < a: < + «> such that 

P{ai < X < ai} > 0, P(ai < Y < Oi] = 0, 

P[X < ail = P{Y < ai) = 0. 

This is followed by an interval [bi, 1 / 2 ], hi = 02 , such that Pjbi < A < 62 ) =0, 
P{bi < Y < bi} > 0 This is in turn followed by an interval [ 03 , 04 ], 03 = b 2 , ^ 

such that P{a 3 < X < Ui) > 0, Pjos < F < at] = 0, etc. It is clear that the 
lengths and location of the intervals described are immaterial, provided only 
that they do not overlap. Also the distributions of X and Y within each 
interval are immaterial, provided only that they are continuous All that 
matters for finding P{F, (d* , d*)} is that the number and the order of the dis¬ 
junct intervals shall be the same as those of the runs in V, (i.e , intervals of 
positiv e probability for X must alternate with intervals of positive probability 
for Y, the number of intervals of positive probability for X and for Y must 
equal respectively the number of runs of the element 0 and the number of runs 
of the element 1 , and the probability of the first interval on the left shall be 
positive for X or for Y according as the first run in V is of elements 0 or of ele¬ 
ments 1 , with the same relation obtaihing between the last interval on the right 
and the last run in V) and the probability of these intervals. Let Pi,' be the 
.sought for probability of the interval which corresponds to the jth run of ele¬ 
ments 0 and Pij the probability of the interval which conesponds to the jth 
run of elements 1 . In order to obtain V, it is necessary that the elements con¬ 
stituting each run shall fall into its corresponding interval. Then cleaily by the 
multinomial theorem 

(5.2) p{y; (dr, 4)1 = n n,!(n (?.,!)-'p.v) 



268 


J. WOIiPOWTTZ 


where j = 1, 2 and where, when i is fixed, the product with respect to j is taken 
over all runs of the corresponding element. The right member of (5.2) is to be 
maximized with respect to the P ^,, subject of course to the constraints 

(6.3) D Pi, *=1 (i = 1. 2). 

i 

Then it may easily be verified that the maximum occurs when 

(6.4) P.i ^ (z - 1, 2) 

n. 

For, after multiplying by a constant and taking the logarithm we introduce two 
Lagrange multipliers /xi and fii so that the maximizing P.j are given by the 
equations (5.3) and those obtained by equating to zero all the partial cleriva- 
tive.s of 

ZE(i.7iogP.,--^,P0. 

I J 

The latter are therefore 



for all j, whence (5.4) follows, It is easy to ace that the extremum thu.s ob¬ 
tained is a maximum and also an absolute maximum. The aoughb-for statistic 
TiV) is then the right member of (5.2) after the results (5.4) have been inserted. 
It may be simplified by removing all factors which are functions only of ni 
and iVi (since these will then be the same for all F) and recalling that 

(5.5) = n, (i = 1, 2). 

) 

It will be convenient to take the logarithm of the resulting expression, so that 
with a .slight change of notation we finally have 

(5.6) T{V) I, 

1 / 

where 

m 

This re.sult is immediately extensible to the problem of k samples and by way 
of summary we recapitulate it as follows: 

Lot there be given k stochastic variables Xi, , Xi. with the respective 
distribution functions fi{x), •• • ,/*(a:), about which nothing is known except 
that they are continuous. Random independent observations, n,. in number, 
are made on Xi {i = 1, ,k). It is desired to test the hypothesis that 

fi m ft s ■ ■ ■ = fk, the admissible alternatives being all fc-tuples of continuous 
distribution functions. The sequence V is obtained from the sequence 2 by 



ADDITIVE PARTITION FUNCTIONS 


269 


replacing an observation on X, by the element i. Let Z,, be the number of 
elements in the jth run of elements i. Then the corresponding statistic for 
testing the null hypothesis is TkiV) or any monotonic function of it, where 

T,{y) = E E I, 

1-1 j 

and \ij IS given by (5 7). The large values of Thiy) are the critical values. 

Let r.y denote the number of runs of length j in the elements t. Let 
E = »■.. Of course Ej^'ij = n,. Also let = E • Then 

J 3 I 

(5-8) T,(7) = EEjn; 

* J 

and 


(5.9) Ti(F) = EJs,. 

If a table were constructed of the numbers (5.7) from 1 to 50, say, or fiom 1 
to 100, this would cover most of the cases arising in practice. The calculation 
of TfcCF) by means of (5 9) would then be so simple that it could be performed 
very expeditiously by an ordinary clerk and with very much less labor than is 
required for most statistics in common use, like the correlation coefficient, for 
example. As a matter of interest we note that 

1 = 0 

2 = .693 

3 = 1 60 


4 = 237 

5 = 3.26 

and that 

(5.10) P < P 

where p is any integer > 1. (5.10) follows from the fact that 

p' > (\/27rp — l)p’'e“^. 

The distribution of T{V) may be found for small samples by enumeiating 
the sequences V, all of which have the same probability under the null hypothesis, 
and assigning to each V its T{y). The critical region consists of the V’s for 
which T{V) takes the largest values, taken in sufficient number to make the 
critical legion of proper size It will not be necessary to enumerate all the 
F’s, since it is readily apparent that certain F’s can never belong to a critical 
region of any reasonable size, (Roughly speaking, a V with a large number of 
runs of short length will yield a small T{V) and idee versa ) For large samples, 
the result of Section 3 is available, with f{x) = x. From (5.10) it follows 



270 


J. WOLPOWITZ 


easily that the corresponding series (3.9) is convergent, .so that T(V) is asymptot¬ 
ically normally distributed. It mu.st be. remembered when using tables of the 
normal distribution that the critical region of T'iV) lie.s in only one "tail” of 
the normal curve. The greatest difficulty ivill occur for samples of moderate 
size. Methods like those of Olds [7] will probably help there,. It is highly 
unlikely that any practicable formula which would give the exact di.stribution 
of T(V) exists, 

A few brief remarks may be made here on a related problem. Suppose we 
have observations from two bivariate populations about the distribution,s of 
both of which nothing is known except that they are continuous and it i.s sought 
to test whether the two populations have the same distribution functions. 
Suppose further that it were requiied that the statistic used for this purpose be 
invariant under any topologic transformation of the whole plane into itself. 
At this point we quote the following topologic theorem, the proof of which was 
communicated to the author by Dr. Herbert Robbins: Let Xi , yi , Xz, ■ , 

Xp , yp be any 2p distinct points in the plane. There exists a topologic transforma¬ 
tion of the whole plane into itself which takes a;, into j/i (f = 1, 2, • • • , p). As a 
consequence of this theorem we get the absurd result that the reciuired statistic 
must be a constant. Hence this statistical problem can have no solution. 

As a matter of interest this statistical problem would have no solution even 
if it were not for the topologic theorem. The fact is that a continuous distribu¬ 
tion on a line remains continuous under a topologic transformation of the whole 
line into itself, but a continuous distribution in a A:-dim(‘n.sional (Euclidean) 
apace (fc > 1) may become discontinuous under a topologic transformation of 
the whole space into itself. (The probability distribution in the first .space 
always determines a probability distribution in the transformed space, for 
probability functions are defined over all Borel sets of the space (cf. [15], p. 7) 
and a topologic transformation carries Boiel sets into Borel sets (ef. [16], p. 195, 
Theorem II)). Consider the following example in the plane: A bivariate 
distribution function assigns probability 1 to a line L oblique to the coordinate 
axes, while any interval which contains no segment of the line L has probability 
0. On the line L the (one-dimensional) probability distribution may be ar¬ 
bitrary, provided it i.s continuous. The bivariate distribution function is 
without difficulty seen to be continuous. Now rotate the coordinate axes until 
one of them is parallel to L. It is easy to sec that after the rotation the bivariate 
distribution function is discontinuous. 

The quc.stion of whether a useful statistical problem could be obtained by 
properly delimiting the class of transformations which are to leave the statistic 
invariant and the solution of such a problem remain to bo investigated. 

6. The problem of the independence of several variates. This is an important 
practical problem and one of the earliest discussed in the literature (cf., for 
example, [6]). Let Xi and Xz be stochastic variables with the joint (cumulative) 
distribution function F{xi , .Ta) vi'hich Is known to be continuous in both variables 



ADDITIVE PARTITION FUNCTIONS 


271 


jointly (i.e., F{x\, X2) = P[Xi < xi ; Xt < x^}, where the right member is the 
probability of the occurrence of hath the relations in braces). The marginal 
distributions fi{xi) and/2(j:2) of Xi and X2 respectively are defined as follows: 

= P{Xi < xi) = lim Fixi, X2), 

^ 2 — 

fi{x2) = P[X2 < 3:2) = lim Fixi, 3:2)■ 

^l-^+oo 

(It is easy to see that the continuity of F{xi, X2) implies the continuity of fi{xi) 
and f2{x2).) 

The n random, independent pairs of observations 3:11, 3:21, ■ ■ ■ Xin , are 
made on Xi and X2. The null hypothesis states that 

( 6 . 1 ) P(a:i, 3:2) = fi(xi) -Mxi) 

i.e., that Xi and Xi are independent The alternative hypotheses are that 
F(Xi , Xi) does not satisfy (6.1).' 

Let the set sin , X12,0:13, ■ ■ • , Xm be arranged m order of ascending size, thus; 

Z = , 22, 23, ■ • ■ , 2„ where Zi < 22 < • < 2 „ . The jth member of this 

sequence will be said to have the rank j. In the same manner ranks are assigned 
to the 3:2j (j = !,••■,«.). (It IS ea.sy to see that, since /i(xi) and f^ixi) are 
continuous, the probability that z, = 2^+1 is 0 etc) In the sequence Z the 
element 2 , (j = 1 , ■ ■ , n) is replaced by the rank of its associated observation 
on Xi . We obtain a permutation of the integers 1 , 2 ,- , n which we denote 

by R. If in the procedure for obtaining R, we had reversed the roles of the Xij 
and Xii , we would have obtained the permutation R' It is easy to see that 
any statistic, say M", used to test the null hypothesis, must be a function only 
of R, with the added proviso that M"(P) = M"{R') (The rank correlation 
coefficient is such a statistic ) Under the null hypothesis all the R have the 

same probability 

The procedure of applying the likelihood ratio principle to this problem would 
then be as follows, fl is the totality of all bivariate distribution functions 
H(,Xi, Xi) which are continuous m both variables jointly The respective mai- 
ginal distributions corresponding to H(xi , xz) will be denotedby hi (xx) and k2(xi). 

01 ie a subset of SI which consists of all If (xi,X2) for which P (xi,X2) = hi{xi)‘hzixi). 

The sample space is the totality of all sequences R. The null hypothesis states 
that F{xi, X2) is a member of to. The admissible alternatives are that F{xi , Xx) 
is a member of not in 01. The distribution of any function of R is the same 
for all members of to. Thus the essential requirement lor the applicability of 
the likelihood ratio criterion is fulfilled. All sequences R have the same proba¬ 
bility for all members of to; hence the numerator of the likelihood ratio is a func- 

> It 18 easy to see that the independence or dependence of two stochastic variables is not 
a pioperty which will remain invariant under a topologio transformation of the plane into 
itself, We therefore require of the statistic only that it be invariant under topologio trans- 
formation of each vauablc into itself, separately. 



272 


j. wOLFOwrrz 


tion only of n which may therefore be ignored. We may then define M\R), 
a monotonic function of the likelihood ratio a-s the denominator of the likeli¬ 
hood ratio, thu.s; 

(6.2) M'{R) = mtxyiP[R\H{x\, x?)! 

a 

where r{R,H{xi, xg)) Ls the probability of R when //(.Ti , Xj) i.s the joint distri¬ 
bution function of Xi and Xi . The critical values of W{R) are the large values. 

We now propo.se an approximation to M'iR) which we shall call M{R). We 
do thi,s by de,scribing a distribution function H*{xi , xs) for eac!h R which seems 
a plau.sible approximation to a maximizing distribution function. It may be. 
derived from certain a.saumption,s about the nature of the maximizing distrilm- 
tion function which wo omit. The remarks made in the preceding section about 
the character of the approximation apply here as well. As before we specify 
only the form of the function and leave certain parameters, finite in number, 
to be determined in accordance with (6.2), (If the construction of /J*(xi, xj) 
should appear somewhat involved, this is due only to the analytic description. 
A sketch will show the essential simplicity of the situation.) We then have 

MiR) « 

Let R = ai, Os, ■ • • , 0 ,, be a given permutation of the integers 1 to n. A 
sub-sequence au+D , 0 (,+ 2 ), ■ • • , will be called a run of length I if the 
following conditions are fulfilled: 

(6.3) The indices of the a’s are consecutive, 

(6.4) If I' is any integer such that 1 < I' < I, then 

I 0(,+/') — 0(,>/<+!, 1 = 1, 

(6.5) if i > 0, I a,' - a(,-+i) | > 1, 

(6.6) if i -b Z < n, I 0(,+i) — 0(,+i4.i) | > 1. 

The run will be called an n.scending run or a descending run according as 
0 (.+i) — 0 ( 1 + 2 ) = — 1 or -bl. A run of length 1 is of either type, at pleasure. 
Tor example, let 

= 5, 6, 1, 4, 3, 2. 

The first run is 6, 0, the second 1, the last 4, 3, 2. 5, 6 is an ascending run of 
length two, 4, 3, 2 a descending run of length three, and 1 a run of length one. 

, Xi) is a degenerate distribution function such that the relation between 
Xi and Xs is functional (this is a special case of stochastic relationship). That 
is to say, Xi = v('(X0, where (piXi) is a single-valued function defined for all the 
possible values of Xi, with a .single-valued inverse <9~‘(X2) defined for all possible 
values of Xi . Hence H*{xi , Xt) is completely specified when the function 
Xi = v{Xi) and hi (xi) the marginal distribution function of JTi, are given 
(Zii(xi) must of course be continuous). 

Consider a .system of intervals on the line — oo < Xi < -b “ of which (i — l,i) 



ADDITIVE PARTITION FUNCTIONS 


273 


is the zth, i = 1, 2, n and a similai system on the line — °o < T 2 < + °o. 
(Actually, as in the previous section, neither the length of the intervals nor 
their location is material The intervals need merely be disjunct and in a certain 
order. We are using these particular intervals to simplify the notation) Let 
h be the length of the first run. ai is its first element. Then let 

Pi = P{0 < Xi < k-,ht(xO} 

be one of the as yet undetermined parameters We now partly define ht(xi) 
as follows: 

ht(xi) =0, xi < 0 

(6 7) htiXi) = 1, xi>n 

htih) = Pi. 

Within the inteival (0, Zi), hi{xi) may be any continuous monotonic increasing 
function which satisfies (6 7) We partly define as follows’ 

If the first run is ascending, let 

(6.8) ^fl(O) = a, - 1 

(6 9) ((5(a’i) = ai — 1 + ai, 0 < Xi < Zi. 

If the first run is descending, let 

(6.10) v’(O) = ai 

(6.11) <p(a5i) = ai — ail, 0 < Xi < Zi. 

We proceed in this manner through all the runs of R Let Z, be the length of 
the fth run. Let X, = ^ Z, . The first element of the jth run is a(x,+i) • Let 

><j 

p, = P{Xj < Xi < X, + Z, ; Zir(ari)), 

be anothei of the as yet undetermined parameters. We then define h* (xi) as 
follows: 

(6.12) ht{\) = E P. 

KJ 

(6.13) + = T. Pi- 

Within the interval (X,, X, + Z,), hi{xi) may be any continuous monotonic in¬ 
creasing function which satisfies (6.12) and (6 13). We define ipiXi) as follows; 
If the jih run is ascending, let 

(6.14) ip(xi) = a(x,+i) - 1 + xi (Xy < xi < X, + I,). 
If the jth run is descending, let 

(6.15) <p{xi) = a^x,+i) — xi (Xy < a;i < X, + Zy). 

If Z, = 1, the run may be considered ascending or descending at pleasure. 



274 


J. WOLFOWITZ 


In Older to obtain R, it is nocesKary that all the eU-monts of a lun shall fall 
mto Its corresponding intoi'\al. Then it is easy to see that by the multinomial 
theorem 

(6.16) r,)l = n! II (M)" p!‘. 

» 

The light member of (0.16) is to he maximized with rc.siiect to the p, .subject to 
the con,strain! 

(0,17) £?}. = 1. 

It i.s easy to verify that the maximum occurs when 

(6.18) p. = 

n 

M{R) i.s the right member of (G.16) after the rc.sult.s (0.18) have, been inserted. 
It i.s convenient to remove all factons which are functions only of n and to take 
the logarithm of the resulting expre-ssion. Then with a slight change of nota¬ 
tion we may say that 

(6.19) M(/i;) = :^l. 
where 

( 0 . 20 ) 

The critical valuc.s of M{R) are the large values. One may verify without much 
difficulty that M{R) — M{R'), 1 e., that the statistic is symmetric with respect 
to ilfi and X 2 a.s indeed it .should be. 

This i'e.sult i.s immediately extensible to the ’probleni of testing whether k 
stochastic variable.s A’'i, • • ■ , Xk arc independent. We shall not go into the 
details, which are .similar to those described above, and content ourselve.s witli 
giving the definition of a run for the case k = 3. After the observations on Zi 
have been arranged in a.scending order, we obtain two seciuences R 2 and R », 
the associated ranks of the observations on Xt and X, . Let ft = bi, bz, • • • , 
and R 3 = bi, bi, ■ ■ , fi,, . Tlie ascending sequence, of consecutive integers 
{i + 1), (i -f- 2 ), • • ■ , (i -b 1 ) determines a run of length I if the, seciuences 
l^u-i-i) ) f^(i+ 2 ), ••• ,b(, K) s-nd , lici+j), I satisfy (0.4), and if at 

least one of the sequences .satisfies (0.5), and at least one, but not necessarily 
the same one, satisfies (6.6). The adjectives ascending and descending apply 
to each sequence separately, 

Let rj be the number of runs of length j in R. Then 

(6'21) M{R) = Eft. 

) 

Most of the remarks made in Section 5 about the small sample distribution of 
r(F) are also applicable to the distribution of MiR). More will be said in the 



ADDITIVE PARTITION FUNCTIONS 


275 


next section about the distribution of M{R) which involves the solution of a 
combinatorial problem not discussed in the literature. 

7. On the distribution of W(R). While most of the remarks made about the 
small sample distribution of T{V) apply to the question of the distribution of 
M{R) in small samples, the situation with respect to the distribution of M{R) 
in samples of medium size and large size is very different and, in certain respects, 
is more favorable for practical application than is the case Avith T{'V) It would 
be reasonable to expect, for example, in view of Section 3 and of the structure 
of the statistic M(R) that the asymptotic distribution of ilf(fE) should be normal. 
Surprisingly enough, this is not the case It is not even continuous In order 
to clarify the situation, we begin with a few necessary ideas and definitions. 

Let the stochastic variable WIJS) be defined as the total number, in R, of 
runs of the sense of Section 6. We shall be interested in the distribution of W(R). 
The number n of the pairs of observations on Xi and X 2 (we consider the case of 
two variates) will be assumed arbitrary but fixed throughout the discussion and 
will not be exhibited Let X (fc) be the number of sequences R (of the integers 
1 to n) which contain exactly k runs. 

Consider, for example, for the case n = 6, the sequence 2 3 4 6 5 1. We 
shall say that this sequence contains the “contacts” (2, 3), (3, 4), (6, 5) In 
general, a contact is defined as the juxtaposition, in the sequence R, of consecu¬ 
tive numbers, whether in ascending or descending order. If fc is the number of 
runs and I the number of contacts in a sequence R, then obviously 

(71) /c + l = n. 

Let Ro be the quence 1, 2, • • • , n of the first n integers in ascending order. 
The n — 1 contacts of this sequence may themselves be arranged in a sequence 
R* of contacts, thus' 

(1, 2), (2, 3), • • • , (n - 1, n). 

Suppose I of the contacts which constitute the sequence R* are selected in some 
manner to form the set 0. The remaining n — 1 — I contacts form the comple¬ 
mentary set O'. After this selection the sequence R* may be considered a 
sequence of the type of the sequences V of Section 5 with the members of 0 
playing the role of the elements 0 and the members of 0' playing the role of the 
elements 1 When R* is considered in this manner we will write it as R*(0). 
The definition of a run of Section 5 as applied to sequences V is now applicable 
to R*(0). We will call any such run of the members of 0 or of O' a group. 

We wish first to answer the following question' In how many ways can the 
set 0 be selected from among the elements of R* so that it will contain I mem¬ 
bers arranged in R*{0) in t groups? If, for a given 0, i' be the number of 
groups into which O' is divided in R*{0), it is clear that i — i' can equal only 
— 1, 0, or -f-1. Hence only four situations can arise, as follows, 
a) i' = li + 1. The first group in R*{0) is therefore composed of elements of 



276 


J. WOLFOWITZ 


O'. The number of way.s in which I elements can be divided into i runs of the 
type of Section 2 is the coefficient of a:' in the purely formal expansion of 

(j: + + a:’ +•••)’ = 

and is therefore (i ~ Similarly n - I - I elements can be divided into 
i' = i I runs in way.s. Hence thi.s situation will ari.se in 


U - I - 2 


C:0( 


way.s. Hence thi.s situation will arise in 


n - 1 - 2^ 


b) i> = i — 1 . By a similar argument as above, this can occur in 


/I - 1\ fn - I - 2\ 

(i-lA i~2 

c) i' = i and the first group is made up of elements from 0. This will occur in 
(I - 1\ fn - I ^ 2\ 

d) i' == i and the first group is made up of elements from O'. Thi.s will also 

. /I - i\ fn - I - 2\ 
occur m j ways. 

The set 0 which contains I elements arranged in i groups can therefore be 
selected in 

ways, and the quantity (7.2) is, by elementary combinatorics, equal to 


Let any set 0 of I contacts divided into i groups be selected from R*. Imagine 
that each contact in 0 sets up, in iJo, an unbreakable bond which links the two 
elements involved in the contact, but no contact in O' creates such a bond. 
Given these bonds set up by 0, we seek the number of different sequences into 
which the n elements of Ra can be permuted while respecting these bonds. 
Since there are I bonds, we can actually manipulate only n — I entities, except 
that two elements linked by a bond may have their order reversed; for example, 
if 0 contains (1, 2), 1 may either precede or follow 2 and the bond would still 
be respected. However, if one contact in a group is reversed, the group as a 
whole must be reversed, else a bond would be broken. Hence the number of 
distinct sequences into which Ro may be permuted while all the bonds set up 
by 0 are respected is 2'(n — f)l. 

Let us refer to the sequences thus obtained as the family generated by 0. 
All the sequences in a family are distinct. Now let 0 range over all sets of I 



ADDITIVE PARTITION FUNCTIONS 


277 


contacts selected from R* The various families obtained will not be disjunct, 
but some will have sequences in common. In spite of this, we seek the total 
of the number of sequences in all the families. The total of the number of 
sequences in all the families generated by sets of I contacts divided into i groups 
is, by (7.3) and the result of the preceding paragraph, 

Ki - DC r *)<”-»'• 

Sets of I contacts may consist of 1, 2, - • • Z groups, so that the total number of 
sequences in all the families generated by sets of Z contacts is 

C - DC r 

where Z may take the values 1,2,- • , (n — 1). The conventions on the combi¬ 
natorial symbols will be: 

(o)-l, <.>0, 

(“)-0, a<b. 

Define A a as 


(7.6) 


Ao = n' 


The folloAving equation is trivial; 

(7.7) ^0 = Z N{{). 


We now consider all the families generated by sets 0 which contain exactly Z 
contacts. As was said before, the total of the number of sequences in each is A i 
Let H{1) be the set of all the sequences in all these families, with each sequence 
in H{1) counted as many times as the number of families in which it occurs. 
Every sequence in HQ) has the Z contacts of the set 0 which generated it, but 
after permuting iZo other contacts may still exist. Hence every sequence in 
HQ) has at least Z contacts and therefore by (7 1), at most n — Z runs. Clearljq 
a sequence which has exactly Z contacts occurs exactly once in HQ), since it 
can appear only in the family generated by the set 0 of its Z contacts and in no 
other family. A sequence which has exactly (Z -|- 1) contacts will appear 


exactly ^ times in HQ), for it will appear once in each family generated by 


a set 0 which consists of one of the 


Cr) 


selections of Z contacts from among 


its Q + 1) contacts, and in no other family 
exactly (Z -t- 2) contacts will appear in HQ) 


Similarly each sequence which has 



times, and so forth. 


We 


therefore obtain, in view of (7.1), 



278 


J. WObFOWITZ 


(7.8) A* = Z (j) -t) (i « 1, 2. ■ ■,{n~ 1)). 

The system of n linear cquation.s (7.7) and (7.8) completely determine,s the 
quantities A''(l), N(2), • ■ ■ , N{n). The matrix of thc.se equations has a deter¬ 
minant whose absolute value is 1. so that the qiiantitio.s jV(1), jV(2), • ■ , N{n) 
may readily be expressed in determinantal form. Furthermore the moments 
of 17(72) are readily found from these equations. Thus from (7.8) for L = 1 
we find 

(7.9) .S(T7(i2)) = ^ - n - 2 

71 > 

and from (7.8) for J = 2 and I = 1 we find, after a little obvious manipulation, 

(7.10) Awm = - 2. 

Higher moments of 17(72) may be found in similar manner. 

Since the limiting variance of 17(72) is 2 it follows that the asymptotic distri¬ 
bution is not continuous. For n of any size the, bulk of the values are concen¬ 
trated in a short interval ending at n. When 17(72) = n, 71/(72) = 0, when 
17(72) = n — 1,71/(72) = log 2, and when 17(72) = n — 2, M(Ii) = log 4^ or 
log 4, It is easy to see that for the values of 17(72) which differ very little 
from n there are only a .small number of values of who.so asymptotic 

distribution is also discontinuous. The .statistic 17(72) is therefore a good 
approximation to the statistic 71/(72) for the purpo.ses of test.s of significance 
(for 71/(72) the large values are the critical values and for Wifi) the small values 
are critical), and has a few additional practical advantages. It is even easier 
to compute than 71/(72); the computation is best performed by counting con¬ 
tacts. Since the limiting variance is a small con.stant, it follow.s that many 
tests of significance can be performed simply by use of Tchebycheff’s ineciuality, 
For example, suppose a given large sample contains 9 contacts, i.e., n ~ 9 
runs (we say a "large” sample in order to use the simple limiting mean and 
variance; if desired or for a small sample these latter may be, computed exactly 
by (7.9) and (7.10)). Then by Tchebycheff’s inequality it follow.s that the 
probability of obtaining n — 9 or fewer runs is less than ,041. Thus the presence 
of 9 contacts would be sufficient to render a sample of great size significant on a 
5% level. For the few numbers of contacts about which doubt will exist as to 
whether or not they are critical values two procedures are possible. Either the 
equations (7.7) and (7.8) may be solved exactly for the doubtful values, or 
several higher momenta may be found from (7.8) and the method.s of Wald [14] 
can be applied to delimit the missing probabilitie.s to any accuracy desired. By 
enumerating the few values of M (72) which correspond to several of the largest 
values of 17(72) the distribution of 71/(72) may be computed sufficiently to serve 
the purposes of tests of significance 



ADDITIVE PARTITION FUNCTIONS 


279 


REFERENCES 

[1] Maurice Frechet, Oinirahtes sur les Probabihies, Variables Aleatoires, Pans (1937) 

[2] A. Wald and J Wolpowitz, Annals of Math, Slat , Vol 11 (1940)^ p. 147. 

[3] A. M Mood, Annals of Math, Stat , Vol 11 (1940), p. 367. 

[4] J. Neyman, Phil Trans. Roy. Soc London, Vol 231 (1937), pp. 333-380 

[5] A. Wald and J. Wolpowitz, Annals of Math Stat, Vol 10 (June, 1939), p 105 

[6] H. Hotelling and M Pabst, Annals of Math Stat , Vol. 7 (1936), p. 29. 

[7] E. G. Olds, Annals of Math Slat , Vol 9 (1938), p. 133 

[8] W R Thompson, Annals of Math Slat., Vol. 9 (1938), p 281. 

[9] W. J Dixon, Annals of Math. Stat , Vol. 11 (June, 1940), p 199. 

[10] Milton Friedman, Jour Amer Slat Assoc , Vol 32 (1937), p 676. 

[11] J. Neyman and E. Pearson, Trans Royal Soc., A., Vol 231 (1933), p 295 

[12] A. Wald, Bull. Amer Math. Soc , Vol 46 (1940), p. 235. 

[13] A. Wald, Bull Amer. Math. Soc , Vol 47 (1941), p 396. 

[14] A. Wald, Trans. Amer. Math. Soc., Vol. 46 (1939), p 280. 

[15] Hahald Cramer, Random Variables and Probability Distributions, Cambridge (1937) 

[16] F. Hausdorpp, Mengenlehre (Second Edition), Berlin and Leipzig, 1927. 



ON THE THEORY OF TESTING COMPOSITE HYPOTHESES 
WITH ONE CONSTRAINT 

By Henry Scheff^i 

Pnnceton University 

1. Introduction. Our purpose is to extend some of the Neyman-Pearson 
theory of testing hypotheses to cover certain cases of freciuent interest which are 
complicated by the presence of nuisance parameters. Our results give methods 
of finding critical regions of types B and Bi . Type B regions were defined by 
Neyman [1] for the case of one nuisance parameter. Type Bi regions are the 
natural generalization of the type Ai regions of Neyman and Pearson [6] to 
permit the occurrence of nuisance parameters. The reader familiar with the 
work of these authors will recognize most of the notation and some of the 
methods. 

We consider a joint distribution of n random variables zi, Xi, > • ■ , x„, 
depending on I parameters , 8i, • ■ • , Bt, I ^ n. The functional form of 
the distribution is given. The random variables may be regarded as the co¬ 
ordinates of a point E in an n-dimensional sample apace W, the parameters, 
as the coordinates of a point 6 in an Z-dimensional space fi of admissible param¬ 
eter values, n, unlike W, in general will not be a complete Euclidian space. 
Let u denote the subapace of defined by . The hypothesis we consider is 

Ht: 0 < w. 

Neyman and Pearson [4] call Ho a hypothesis with Z — 1 degrees of freedom; 
for our present purpose we shift the emphasis by saying it has one constraint. 

It is clear that whenever we test whether a parameter has a given value, and 
other parameters occur in the distribution, we are testing a hypothesis with one 
constraint. Hypotheses of the type tfi = , in which we do not specify the 

common value of Bi and Bo, nor the values of any other parameters, may always 
be transformed to Ho by choosing new parameters. In general, the hypothesis 
that the parameter point 9 lies on some hypersurface in n, ff(Bi, Bi, • ■ ■ , Bi) = 
go, may be transformed to Ho if the function g satisfies certain conditions,— 
say, g is continuous and monotone-increasing in one of the B's for all 0 in fi. 
Another circumstance lending importance to the theory of testing hypotheses 
with one constraint is its connection with the theory of confidence intervals, 
which we shall point out below. 

The path which led Neyman to critical regions of type B is the followipg: 
Every Borel-measurable region iv of sample space determines a test of Ho, 
which consists of rejecting Ho if and only if E falls in w. In deciding which 
is a most efficient test, one may limit the competition to similar' regions, if 
such exist. Because of the general non-existence [2, p. 372] of uniformly most 

‘ Defined by condition (a) of definition 1. 

280 



ON THEOHY OB' TESTING COMPOSITE HYPOTHESES 


281 


powerful tests, one is led to consider common best critical regions [4] if he is 
interested only in alternatives 6i < 6° (or ffi > ej), or else regions giving an 
unbiased test [1, p. 251]. Narrowing the competition further to the latter 
class of regions, one is led to regions of type B if he seeks tests which are most 
powerful for 0i very near to flj, and to type Bi regions if he is not content with 
this. These types of regions are defined in section 2 

We may now state the relationship of hypotheses with one constraint to the 
theory of confidence intervals [2]. To find confidence intervals for di, we must 
first find similar regions w(di) for testing Ho If with every admissible ffi 
we can associate a then confidence regions for 0i are determined, and if 

these be intervals, they are confidence intervals. Every class of similar regions 
mentioned above is intimately related to a category of confidence intervals. 
In particular, to find Neyman’s short unbiased confidence intervals we must 
first solve the problem of type B regions. Likewise, if we define shortest un¬ 
biased confidence intervals in the obvious way along the lines laid down by 
Neyman, their discovery rests on the solution of the problem of type Bi regions. 

While the assumptions of section 3, especially 3°, are unpleasantly restrictive— 
they are obviously tailored to fit the proof rather than the problem—they are 
nevertheless satisfied in many sampling problems associated with normal 
distributions. An application of the theorems of section 4 will be given in 
another paper On Ihe ratio of the variances of two normal populations. The present 
theory was needed to round out that paper and was originally planned as a 
section thereof. However, it seems desirable for the convenience of other 
workers who might have use for the theory not to bury it under the preceding 
title. 

Section 5 consists of an appendix on the moment problem raised by assump¬ 
tion 6°. 

2. Definitions. The symbols w, ipt, wi will always be understood to denote 
Borel-measurable regions in W. We shall symbolize d'Pr[E tw | Q\/ddl for 
f = 0, 1, 2 by P{w I 0), P'(w I 0), P''(w I 0), respectively. Since 6i plays a 
distinguished r61e, it will often be convenient to write 0 = (0i, d), where the 
nuisance parameters are denoted by t? = (Si, ffs, • ■ • , Sf). 

Definition 1 : Wo is said to he a type B region for testing Ho if for all 0 in u 

(a) P{wo I 0? , I?) = a, where a is independent of t>, 

(b) P'(wo I eS , t>), P"(wo 1 eS , t>) exist, 

(c) P'(wo I 0?, t>) = 0, 

(d) P"(wo I , t>) P"(wi I 5?, i?) for all wi satisfying (a), (b), (c) 

Definition 2: Wo is said to be of type JBi if the conditions (a), (b'), (c), (d') 

are satisfied. The conditions (a), (c) are given in definition 1, the other two are 

(bO P'(wo I 01 , i>) is continuous in 6% at 9i = o\ for all 0 in w, 

(dO P(iuo*| 0i ,&) ^ P(wi I 01 , 1 ?) for aU Wi satisfying (a), (b'), (c), and all 
0 in Si. 



28a 


jfj'sp? ' nrwi 


3 . ASKumptlon*. *> - • '* * «'5‘n- n-.jwirjf, % the 

p.d.f. (pmt>abiltty fyrK?5'.ii, .-aJ.'b.n-. j,. ^ 

distribution ti(’ppnd*.«iri« ^ Tfs^ sium’^'rsr,®-.i-'b, 

that of Neyman ffN*«brrr ]\] 

1®. (a) Tiiprp pwt.« a p.df p>K **• o'i'h jjhv v. mid any 0«£1, 

J p-K n-JH' 

where dtt' denote# tbr voltinu- phnivsnt ds'.dss ih^ 

(b) The region H'% m t*' deJuMt! by p'f »>'i f, tsdeiwtidfnt of B for 
0 < w. 

(c) The eonneetivity af « h ^iieh tb,it i? i-* ii** from any point 

e' in w to any other r»«int B" m f*v a pa?f!i Ivmg rnBfelv tn «? and eoniilsting 
of a finite numlM'r of wnem^ on esrir of mhirh ail ioi? K*ne uf ft . , • ■ ■ , 

are conKtant. 

2®, For alt E t if* and B t w. p A' rij sr difforj-nUaide ivkjre miih renfwl to ft 
and indefinitely with n^!*riert »o ft. ft, • . ft. I'o? nnv w, and any Him, 
the corresponding derivatives* of /'or ■ Iri c'ciet and rr»ja\' tw otitained Isy differen¬ 
tiating under thn integral *ign m > 11 

We now define 

3. i> log p{K 1 Bf/af,. A,dft,, »', /■ « 1.2, . -. , 1. 

3®, For all B i ff' ^ and B t w, “ ftiA‘, B,J t* rontmoouR in K, i *' ■ 1,2, • • -, f, 
and 

( 

(2) “ Ao 4‘ 53 i,j » 2,3, , I, 

r 

(3) #ti “ -d»t + jC BmAt t »» I, 2, • ■ ■ , f, 

where An, ^)t ,«?) are eontinuou* in each of 

ft, ft I ■ *' I ft • 

4®. The matrix (d<p(/dx/), i =® i, 2, • • • , f; ji » 1 , 2, ■ • • , n, contains an . 
I X I minor w'hich is non-singular* for all B ♦ H^and B iw.and whew elements 
are continuous in E. 

Write 4^ « {^ , ift and denote by , # | b) the p.di. of {*i»*) 

calculated under the assumption that B t w, t.n., that the p.dX oH B k 
p(B I 6)/P(w I 0) for A?«w and »e.ro for I?«IF — w. Define 

• If for eaoh 0««, i» violated on *n eseepUonid ml tl( 0 ) for nrhfeh P{f/(0) | 9) 0, 

the theoremB 1 and 2 may *UU k>« valid. What ia ftdwnaal is ten exlatonc# of the pA.t< 
pl^i, A, •■• ,4>il m lor all 9««, On rwotwiderin* th« theowm* and their proofs, the^ 
roader will see that it the set U(&) in deleUsd from Wh(. , then 1»(6) may be viotat^, but not 
seriously, and no essential changes are neceteary. The addlUon of the neo^aary quali¬ 
fying clauses to our statements, regarding sets of probability mn, would encumber the 
developments. 



ON THEORY OF TESTING COMPOSITE HYPOTHESES 


283 


(4) 


e) = f 


1 w, e) d<f,i 


Let wi be any region satisfying condition (a) of definition 1 • 

5°. We assume, for each 0 e u, that if the moments^ of j w)i, 0) and 
I W, 0) are the same then these functions are equal for almost all 

(a) for s = 0, 

(b) for s = 1. 

Note that Qa is p.d.f , Qi is not. 


4. Theorems. A result of Neyman’s [1] for I = 2 is generalized in the fol- 
lowing‘ 

Theorem 1' Under the assumptions l“ to 5“, consider the existence of functions 
ki(i, $1, i — 1, 2, such that ki < kt and 


(5) 

/ +“ 

'i>lp(d>y,^\el,6)d^i, s = 0, 1, 

-flO 


for all ^ = (4n , <t )3 , ' ■,</>()• If such functions exist for some 0 = 0' e«, they 
exist for allQ toi. Then the region wo in W defined by 

(6) «i(J5, el,^) < ki(i, 9l , and 0?, t?) > , i?) 

is independent of t?, and is a region of type B for testing the hypothesis Ho. 

Since throvighout the proof 9 = {ol, d), we shall write 0 in place of these 
symbols to simplify the printing. It is to be understood that every statement 
in the proof involving the symbol 0 is asserted for all 0 in co. 

We suppose first that a type B region wo exists in W+ . Then from (a), (c) 
of definition 1 and assumptions l“(a) and 2°, 


(7) 

f p(E \Q)dW = a 

(8) 

f <l>ip(E\e)dW = 0 

WQ 


Since the value of the integral (7) is independent of d, all its derivatives with 
respect io , do ,■■■, 6i must vanish. This leads [3, pp. 50, 61. Insert kt 
before c/)?' in (15)] to 


’ By this teim we include "product moments " 

* When I communicated this theorem to Professor Neyman, he informed me it was 
among the results of a thesis by R. Sat6, Contributions to the theory of testing statistical 
composite hypotheses. University of London, 1937, and he kindly sent me a copy of the MS 
I decided nevertheless to publish my version of theorem and proof, since for the reasons 
indicated in section 1 this theory should be available in the literature. 



284 


HENRV SCHEFF^:i 


(9) 


f n v{E \Q)dW = Mih, k ,. 

»' wq »«=>'2 


, fc,ie), fc. = 0 , 1 , 2 , 


> 


where M is independent of Wq , and thus has the value obtained from (9) by 
putting Wo = and « = 1, In particular, 


(10) = 0, f = 2,3, •••,;. 

The necessary condition (9) for (7) is also sufficient. Denoting by S(/|w, 0) 
the expected value of a function f{B, 0) calculated under the assumption that 
E ew, equation (9) may be written 

(11) s (^n I Wo, e) = (^n I e). 

From assumption 5-(a) it then follow.s that 

( 12 ) Qo('^ I Wo, 0 ) = Qa(^ 1 IT, 0 ) 

for almost all 4>. Oonver.sely, (12) implie.s (II). 

In a .similar manner we get from (8) with the aid of (9), 

(13) § (^i IX I Wo, 0^ = ^01II I W, 0^ . 

We calculate the moments of the function Qi(4> | w, 0) to be 

“S n 1W, 0^, 

and hence because of 5'’(b), (13) implies 

(1^) Qd-l- I wo, 0) = Qi($ 1 W, 0) 

almost everywhere in the 'l^space. The pair of conditions (12), (14) are equiva¬ 
lent to the pair (7), (8). 

In order that wo be a type E region, it is necessary and sufficient that it satisfy 
(12) and (14) and that 

F"(wo I 0) ^ P"(w, I 0) 

for all Wi satisfying (12) and (14). The inequality may be transformed with 
the help of l“(a), 2", (3), (7). (8), and (10) to 

f <f>^p(Ele}dW ^ f ,fifp(El6)dW, 

"Wo 

which is equivalent to 

/ *V"® 

• ■ • I p(<fn, # I Wo, 0) ch^i • • > dipt 
-ep J~~to 



ON THEORY OF TESTING COMPOSITE HYPOTHESES 


285 


Sufficient for this is 

(15) Q 2 ($ I Wo , 0) ^ QaC® I Wi, e). 

We note the functions in (12), (14), and (15) are all of the form (4) with 
s = 0, 1, 2, and propose to transform these to integrals over certain portions 
of the sample space W. First, we write (4) in the form 
^+00 

(16) Qo(i‘lw,9)J <^1 p(if>i j 4>, w, ©) = Qo(^ I w, e)S(<^J I'J, u), 0) 

Next, we consider “surfaces” >S(4>, 0) in W+ , constructed as follows For 
any fixed 0 let D(0) be the I — I dimensional domain of values of 0 ,(jB, 0), 

2 = 2, 3, I, for E tW + . A “surface” 0) is the locus of points E 
for which 

(17) 0) = <^(, a constant, i = 2, 3, ■ • ■ , I, 

the set of constants being in D(Q). Over every “surface” we now define a 
density p. Without loss of generality, and to simplify the notation, we shall 
assume that the non-singular minor postulated in 4° contains the minor {dij>,/dx,), 

2 = 2, 3, ■ • • , Z; j = 1, 2, • • • , Z — 1, and denote by J{E, 0) its determinant. 
For E on (S(#, 0) we define the density 

(18) p(E I 0) = p(E I 0)/ I JiE, 0) i , 
and consider “surface” integrals 

(19) f F,{E, 0) dxt dxi+i • • • dx„, 
where 

(20) F.(E, 0 ) = ,t>liE, e)p{E I 0 ). 

A “surface” integral (19) is to be distinguished from an ordinary multiple in¬ 
tegral, in that the integrand is not merely a function of xi , xi+i, ■ ■ ■ , x„ ; 
there may be several points E on the surface with the same values for these 
coordinates, but different values for the integrand. The integral is to be 
thought of as follows; The part wS{^, 0) of the “surface” S($, 0) is partitioned 
into pieces AS, on each a point E is chosen, and the value of the integrand at E 
is multiplied by the “area” of the projection (taken non-negative) of AS on the 
xi, xi+i, • • • , a:„-space. The “surface” integral is the limit of the sum of 
such products as the norm of the partition approaches zero. 

Denoting the integral (19) by /(s) for the moment, we may calculate that 
for $ c D(0) 

1(8) = l{0)&((t>l I w, 0), 1(0) = Qo(^ I w, e)P(w I 0), 

and hence we see that the right member of (16) is equal to the integral (19) 
divided by P(w | 0). The desired relationship between the ordinary integrals 
(4) and the “surface” integrals (19) is thus 



286 


HENRY 6CHEFP£ 


( 21 ) Q.(# 1 t/J, 0 ) = f F.(B. e) fr dx,/p(w I 0 ). 

The conditions (12), (14), (15) may now be written 

(22) f F.(£!, 0 ) n dx, = a / F.(S, e)fldx,, 8 = 0,1, 

JiooS(*.0) ;-I ■'S(*,0) j-1 

(23) [ F,{E, 0) n dxi ^ [ F,(E, 0) n dxf, 

•'WqiS(*.Q) 3^1 3'^i 

if $ is in the domain D(0), else they are satisfied trivially. Wo will be a type B 
region if equations (22) are satisfied for almost all $ 4 D(0), and if (23) is valid 
for all wi satisfying (22). 

We now hold 0 fixed m u and d> fixed in i)(6), so that <S'($, 0) is fixed, and the 
right members of equations (22) have constant values. The proof [5, p. 11] 
of the lemma of Neyman and Pearson giving sufficient conditions that a region 
maximize an integral, subject to integral side-conditions, is easily seen to be 
valid for our "surface” integrals, and a sufficient condition that WoS(,^, G) 
have the desired property is then that it be defined by 

(24) '^i(®i 0) > Oo + 0), 

where uo, ai are independent of E on )S($, 0), and are such that equations (22) 
are satisfied. Since 0 and 4> are fixed, we may permit »,■ to bo of the nature 
ai = a((4>, 0), i = 1, 2 Introducing functions ky < kt, k, = A:,(<1>, 0), and 
defining Oo, ai from 

flo — —fclfcl 1 0,1 = ky ki , 

the inequality (24) is satisfied if (6) is. Still holding 0 fixed, suppose that 
ky, kt can be determined for all •I' (hence almost all $) in D(0) .so that for the 
part ■u)oS(4>, 6) of S(^, 0), defined by (6), the equations (22) are satisfied. The 
parts WoS(^, 0) of “surfaces” then sweep out a "solid” iyo(0) in W+ , defined 
by (6). If we can similarly determine ky and k2, and hence 1110 ( 0 ), for every 0 
in 01, and if furthermore iuo(0) is independent of 0, then it is the type B region 
we seek. 

The equations (22) have now served their main purpose, and we return to 
their equivalents, (12) and (14). For i«o(0) defined by (6) 

p(</)y , $ 1 lUo , 0) = p((f>i , 1 W, Q)/a if (#>1 < ky OV <t>y > ki , 

and vanishes otherwise, and hence equations (12) and (14) are equivalent to (6). 

The remainder of the proof consists of deducing that ky, fcj exist, and that the 
associated region u)o(0) is independent of 0, for all © « w, from the hypothesis 
of our theorem that ky , kt exist for some © = 0', By l“(c), 0' lies on a line 
segment L entirely in oi, on which all but one of the nuisance parameters, say 
82, are constant Let us vary 0 over L. Then dj, By, ■ > ■ , 0i remain fixed 
and Bi vanes over an interval I. The equations (2) for j = 2 now become 



ON THEORY OF TESTING COMPOSITE HYPOTHESES 


287 


ordinary differential equations in which the independent variable is 62 , the 
dependent variables are <t> 2 , 4 >z, • • • and 6 °, 6 ^, ■ ■ ■ , 61 are parameters. 
A well known existence theorem assures us of the existence of particular solu¬ 
tions u, and a non-singular (for all 82 m I) matrix {u,,) of complementary solu¬ 
tions, i, j = 2, 3, ■ ■ ■ , I, such that the general solution is 

I 

<f>i lit -j- WtJ Cj m 

1-2 

The u, are determined by initial conditions for the system (2) with j = 2, and 
the u.j by sets of initial conditions for the corresponding complementary system. 
Clearly, if these initial conditions are all chosen independent of E, then since 
the coefficients of the differential equations are all independent of E, the solu¬ 
tions M, and li,, enjoy the same property. On the other hand, the c, are in¬ 
dependent of 82 . Hence 

i 

(25) Si) = M,(9j) -|- ^ u,,{ 8 ^Cj{E), f = 2, 3, • ■ • , Z. 

1-2 

The dependence of the <^’s, u’s and c’s on the parameters 81 , 83 , ■ • • , 81 has 
not been indicated, since these remain fixed throughout the present calculations. 

Let 9) be the Z — 1 dimensional domain of the values of Cf{E) for E e W+ , 
and C: (cj, , ■ • • , ct) be a point in 9), and denote by S{C) the “surface” 

Cj{E) = cj Denote the surface /S($, 0) defined in (17) by S(^, 82 ), and the 
domain D(0) of # by 0 ( 82 ). Then since | w.y | ^ 0, therefore for every 821 /, 
every S{C) with C e 3) is identical with some »S($, Bt) with $ eD(02), and vice 
versa. From this we conclude for later reference: (A) the functions Cj{E) 
are constant on every >S($, flg); (B) if $' 2 , d'l are any two values in 7, then for 
every €> = (D{ 82 ) there exists & eD{ 82 ) such that S{^', 62 ) is identical 
with S{^", 82 ), and vice versa. 

Now let us integrate with respect to O 2 the equation 

i 

d log p{E\e 2 )/de 2 = = ^ 2 ( 82 ) -h ^'U 2 ,(d 2 )c,(E). 

I 

log p(E 182 ) = v(e 2 ) -1- v,i 82 )c,{E) + f{E), 

)-2 

where tj( 02 ), p,'(9s), f{E), and all new undefined symbols in the sequel have 
obvious meanings. We get 

(26) p(F 1 82 } = v( 62 )f(E) exp T X) Vi(e 2 )c,(E) 

L 7-2 

Ne.xt we differentiate the equations (25) with respect to xjc , and write the 
result in matrix form, 

(dff>,/dxic) = (u,j)(dcj/dz/!), i,j= 2, 3, • ■ • , Z; fc = 1,2, • • • , Z — 1. 
Taking determinants, we have 

(27) J{E, fla) = J\{ 82 )Ji{E). 



288 


HENRY SCHBFFii 


Finally, we shall need to know the nature of the dependence of ipi oii ^2 and E : 
From (3), 

I 

d<i>i/d0i = Aa(6i) + • 

Substituting from (25), we get 

I 

d<l)i/ddi = 4 " -^( 62 ) + B]{6i)ci{E), 


and integrating, 

<i>i(E) 6t) ~ E(8i) 




where 

(28) 


-B(f) 

BCflj) = exp J jSisi(ij) d7)J . 


Thus 

(29) iPi(E, 80 = I(eO + E Si,{8t)ci{E) + BieOgiE)- 

J—J 


In equations (22) we now use the definitions (20), (18) for the integrands and 
then substitute (26), (27), (29), As a result we obtain the equality of 


Ai80 + E Bi(e0o,iE) + Bigi)giE)] 
)-S J 


L 


•exp v,(_60ci{E) 
Lf-t 




\MeOME)\ 


n dxi 


j-i 


and a times the “surface” integral of the same integrand over »S(#, fli). Putting 
first s = 0 and then s = 1, and employing the previous conclusion (A), we find 
that the equations (22) are equivalent to 


f (gWiE)/! ME) I 1 n 

(30) 

= « f {g’(E)f(E)/l ME) I } n dx/, a « 0, 1. 

Again using the expression (29) for </>i, and noting from (28) that B(8i) > 0, 
we may write the inequality (6) in the form 

(31) g{E) < xi($, dO and g{E) > K 2 ($, 80, 
where 

(32) «.($, 80 = [fe(^, o\, &) -A{e0 - 22 Siie0c,{E)^ !B(eo. 



ON THEORY OE TESTING COMPOSITE HYPOTHESES 


289 


It follows from our hypothesis that for = di (the di coordinate of 0') and any 
$ eD{ 0 'i), functions k,($, o'i) exist such that for the part d't) of Si^, o'i), 

defined by (31), equations (30) are satisfied. The region iao( 0 ') is “swept out” 
by WoS{^, 62 ) as 3> ranges over D(fi^). Now let 0 ” be any other Q e L, call its 
02 coordinate 62 , let be any $ e 0 ( 62 ), and consider the possibility of finding 
Ki(^’", 62 ) such that on the part woS(^'', 62 ) of >S($”, 82 ), defined by (31), equa¬ 
tions (30) are satisfied. From the conclusion (B), /S($”, 82 ) is identical with 
/S($', 82) for a suitably chosen fD( 82 ). Hence if we take k,($”, 62) = 

82), then WaS{^", 82) becomes identical with d'^ where equations 

(30) are already satisfied. Letting range over D(d2), every woS(^''i 82 ) thus 
determined becomes identical with some WqS{^', 62), and vice versa, by (B). 
Thus the region 'U)o(0'O “swept out” is identical with iuo(0')' This process 
defines Kv(^’, 82 ) for all 82 f I and $ < 0 ( 64 ), and hence determines fc»($, dl , i?) 
from (32). We now have functions k,(^, 6 °, tf), ki < , satisfying (5), and 

corresponding regions ioo'(0) independent of 0 , for all 0 e L, To conclude the 
proof, we use l''(c) to reach any point © in o) from 0 ' by a path consisting of a 
finite number of segments like L on which only one of the nuisance parameters 
varies. The definitions of lc,(f>, 0? , 6 ) are continued alpng this path as above 
and the region iOo( 0 ) is seen to be independent of 0 for all 0 in co. 

The following theorem may be regarded as a generalization of one by Neyman 
[ 6 , p. 33] giving sufficient conditions that a type A region be also of type Ai; 
Theorem 2. Suppose the assumption l°(h) holds for all 0«fi. Denote 
01 , t>) by and let R{^) be the domain of values 0 f , <t>\ 

E « W+ and 0 «w. Then a sufficient condition that a region Wo of type B, found 
by application of theorem 1 , he also of type Bi is that for all 0 til and all E tW+ 

(33) p(E I 01, t>) = p{E 1 01 , d)g{tl>l , , • • • ,<t>i ; > ^11 *^)i 

where g{yi, yi ,•••, yi ] Ol ; 81 , 8 ) is a function such that d^g/dy\ > 0 for all 
yi > Vi I • ■ • iVi ii^ R{ 8 ) and 0 e Jl — 

For the ido satisfying the sufficient conditions of theorem 1, the conditions (a), 
(b'), (c) of definition 2 are satisfied, and it remains only to verify the condition 
(d'). The regions Wi admitted for comparison in (d), as well as Wo, must 
satisfy the equations ( 22 ) since these are equivalent to the conditions (a), (c). 
We recall that 0 = ( 0 ?, i?) in equations ( 22 ) and rewrite them in a notation 
better adapted to our present considerations: 

(34) f [d>°iY {p(E I 0 ;, t?')/| J(E, 0?, «>) 1) n dxi 

«;,!>) i-l 

= a[ [<i>lr{piE\ 8 l,d)/\JiE,el,d)\}i[dx,, s = 0,l 

where 4*° = (<#>*, ,■••, <#> 1 ) «I>(05, 0 ). 

To express the condition (d) in a convenient way, we now “shred” the regions 
wa , wi of (d) for every 0i by means of the same “surfaces” we have been using 



290 


HENEY SCHEFIT^ 


for 01 = 0° : For any w in 1F+ , 0 e ii, and e D(flS, iJ) we define a “surface’* 
integral 

= f {p(£noi,mAE.ei,<})nfidx,. 

Then 

P(w I 01, d) = f • ■ ■ [ u) 1 01,1?) d(f>l d4>\ • • ■ (4°, 

and a sufficient condition for (d) is 

(35) 1(4-“. roo 1 01 , «>) ^ J(^“, tu, 1 01. &) 
for all 0 e fl and all t D(0? , t?). 

Again applying the lemma of Neyman and Pearson to the integrands of the 
“surface” integrals in (34) and (35), we find that a sufficient condition that our 
region Wq be of type Bi is that there exist functions 0? , 0i , iJ), i = 1, 2, 
such that 

p{E 1 01 , 1 ^) > p{E 1 0?, ,»lbo + h<l>\{E, el, &)] 
if and only if E e ouo. Employing (33), we may replace this inequality by 

(36) 0 “ ; 01, «y) > bo + bi<i>i, 

Define bo, bi from 

g(,k ,, 4-“; 0? ; 01, 1 ^) = bo + bifc,, f = 1, 2, 

where k, = kii^’‘, dl, i>). Since ki < fcs, these equations have unique solu¬ 
tions bo, bi. Now hold 0i, ^ all fixed (0i B°) and consider the graphs of 

the members of (36) as functions of (f>i. Prom our definition of bo, bi, these 
graphs intersect at fci, icj. But by hypothesis, the graph of the left member is 
everywhere concave up, and hence for ki < ij>i < ki, it lies below the linear 
graph of the right member, and for < ki and ^l> ki, it lies above. That 
is (36) is true if and only if E e uio . 

5. Appendix on the moment problem. Easily applied criteria [8] are avail¬ 
able for the moment problem of assumption 5“(a). The moment problem 6“(b) 
is much more difficult, however, because the function to bo determined by its 
moments is not of constant sign. Below wc offer a proof that the solutions of 
both problems 5“(o) and 5“(b) arc unique in the important case where p(E \ 0) 
is a multivariate normal p.d.f. and <t>i, ,<l>i are polynomials in xi, 

X 2 , • • ■ , x„ of degree g 2 and not necessarily homogeneous. Since 0 is held 
fixed, we will not indicate dependence on 0, nor will the dependence of various 
functions on s be indicated, since s = 0 or else 1 throughout. 

Let Wi , Wi be any two regions, aj = P(Wj) ^ 0, for which the moments of 
Q.(4> I wi) and Q,(4> | Wi) are the same. To prove the equality (almost every- 



ON THEOBY OF TESTING COMPOSITE HYPOTHESES 


291 


where) of these two functions it suffices to prove that their Fourier transforms 
are identical [7, theorem 61]. Suppressing the customary multiple of \/^, the 
Fourier transform of \w^ is 

^,(t) = f e"’* <?.(«• I w,) d* • • • , 

v—oo J —00 

where t is the vector (ii , h , • • • , ii) and t-$ = + ■ ■ ■ + . From (4) 

we get 

■ • ■ / 4> I w,) ddi d<t>2 ■ • • d4>i 

.00 J-.oo 

= g(e’‘ %{I w,) 

= -f e^**d>l(£!)p(S)dW. 

a/Jwj 

A device of Cram4r and Wold [8] for reducing the dimensionality of the 
problem now suggests itself. Let z be a scalar variable and consider \^,(z 11) = 
^j(zt) for fixed t as a function of z. Obviously if for every fixed t, d'i(z 11) = 
d'zfz 11), then Sl'i(t) = ’I' 2 (t), and we are through. We propose to prove the 
former equality by showing first that d'yis an analytic function of z for all real z 
and secondly that the coefficients of the power senes for 4'i and d'z in powers of z 
are equal. Holding t fixed now, | = t-$ is a polynomial of degree ^ 2, and 


(37) ii(z\i) - f 6^‘^<l>lpdW. 

<Xj Ja, 

By our assumption of normality, 


p = C exp 



a*. Vk Vv , 


= X, — iJt,, 


where the matrix {a„) is positive definite. To prove the analyticity of 
for any real z = zo, let z = zo + f, and restrict f to real values. Substitute 
in (37) 




Qi-0 Q I 


ml 




where |/m(f£) | g 1. Then 

lAXzo + rl t) = Z ^ f p dW + RUzo , f), 

g—0 Q * Cij Jiifj 


where 



292 


HENRY SOHEW^ 


and all integrands are absolutely integrable over W. Let a be the sphere of 
unit radius -with center at (mi . Ma > • • • . Mb) ii' W and write 


R 


jm 


=®r/ +/ 1 


Call the two terms of the right member R',„ and R 


// 

j m j 


Rj„ = Rjm 4* -R} m . 


tfll Olj 


Let M = max | { |, Mi = max | </>' j, for Eta. Then 


I 1 ^ fjdw gM,\ Mr l7mU,, 

Hence R'jm —* 0 for all real f as m —+ <». 


I Cl ^ ir^JlpdTT. 

ml ai Jw-^ 

Let r = yij ) and Mi, Ms be the sums of the absolute values of the coeffi¬ 
cients of the polynomials , 5, respectively, when expanded in powers of y,. 
Then for £ «14^ — <r, | <(^1 | ^ Mjr’, Itj ^ Mjr*, p ^ C exp ( —Xr’), where 
X > 0 is the smallest characteristic root of (a„). Hence 


R' 


im \ 


CMi|M,rr 

ml oj 


f 

•W—t 


,-Xr* 


dW. 


Integrating over spherical shells concentric with a, dW => Mir"”' dr, and 


iCl^ 


CMiMilM.j-r ^ CMiMilM.fl" 

ml Oj h ml uj 



If we evaluate the last integral in terms of a Gamma function and employ 
Stirling’s formula we easily find that for Mi 1 f | < X, R'l„ -» 0. The con¬ 
vergence of Rfm to zero for r«M f, | f | < X/Mi, is sufficient to insure the analy- 
ticity ol fj. 

How let zo =“ 0. Then the coefficient of f* in the power series for f / is 

•” ■\-U^iy4npdW, 

a linear combinatioa (the same for j => 1, 2) of the g-th order moments of 
Q,(* 1 tpy), and hence corresponding coefficients for and are ec^ual. 



ON THEORY OP TESTING COMPOSITE HYPOTHESES 


293 


REFERENCES 

[1] J. Neyman, "Sur la verification dea hypotheses atatiatiques composees,” Bull. Soc. 

Math France, Vol, 63 (.1935), pp. 246-266. 

[2] J. Neyman, “Outline of a theory of statistical estimation based on the classical theory 

of probability,” Phil. Trans. Roy. Soc London, A, Vol. 236 (1937), pp. 333-380 

[3] J. Nbyman, “On a statistical problem arising in routine analyses and in sampling in¬ 

spections of mass production,” Annals of Math. Slat , Vol. 12 (1941), pp 46-76. 

[4] J. Neyman and E S Pearson, “On the problem of the most efficient tests of statistical 

hypotheses,” Phtl Trans. Roy. Soc London, A, Vol. 231 (1933), pp 289-337. 

[6] J. Neyman and E S Pearson, “Contributions to the theory of testing statistical 
hypotheses- Part I,” Stat. Res Mem., Vol. 1 (1936), pp 1-37. 

[6] J. Neyman and E. S Pearson, “Contributions to the theory of testing statistical 
hypotheses Part II,” Stat. Res Mem., Vol. 2 (1938), pp. 25-36. 

17] S Bochner, Vorlesungen tiber Fourierache Integrals, Leipzig, 1932 
[8] H. ChamSk and H. Wold, “Some theorems on distribution functions,” Jour. London 
Math. Soc., Vol. 11 (1936), pp 290-294. 



ON THE PROBLEM OF MULTIPLE MATCHING 

By I. L. Battin 

Drew University 

1. Introduction. The problem of determining the distribution of the number 
of “hits” or “matchings" under random matching of two decks of cards has 
received attention from a number of authors within the last few years. In 1934 
Chapman [2] considered pairings between two series of t elements each, and 
later [3] generalized the problem to series of u and t(< u) elements re.spectively. 
In the same paper he also considered the distribution of the mean number of 
correct matchings resulting from n independent trials, and gave a method, and 
tables, for determining the significance of any obtained mean. In 1937 Bartlett 
[1] considered matchings of two decks of cards, using a number of interesting 
moment generating functions. In 1937 Huntington [12, 13) gave table.s of 
probabilities for matchings between decks with the compositions (5‘), (4*), and 
(3’), where (s') denotes a deck consisting of s of each of t kind.s of cards. More 
generally (siSj • • • si) denotes si cards of the first kind, S 2 of the second, etc. 
Sterne [16] has given the first four moments of the frequency distribution for 
the (S®) case and ha.s fitted a Pearson Type I distribution function to the distri¬ 
bution. Sterne obtained his results by considering the probabilities in a 5 X 5 
contingency table. He also considered the 4 X 4 and 3X3 cases. In 1938 
Greville [7] gave a table of the exact probabilities for matchings between two 
decks of compositions (6‘). Greenwood [4] obtained the variance of the distri¬ 
bution of hits for matchings between two decks having the respective composi¬ 
tions (s') and (5 i«2 • • • si) with Si + Sj + •••-(- Si = si — n, and where it is 
not necessary that all the s’s should be different from zero. Earlier Wilks [19] 
had considered the same problem for i = 6 and n = 25. 

In a very interesting paper Olds [16] in 1938 used permanents to express a 
moment generating function suitable for the problem in question He obtained 
factorial moments and the first four ordinary moments about the mean, first 
for two decks with composition (4*), and then for two decks of composition {$'). 
In 1938 Stevens [17] considered a contingency table in connection with match¬ 
ings between two sets of n objects each, and gave the means, variances, and 
covariances of the single cell entries and various sub-totals of the cell entries. 
Stevens [18] also gave a treatment of the problem of matchings between two 
decks which was based on elementary considerations. In 1940 Greenwood [0] 
gave the first four moments of the distribution of hits between two decks of any 
composition whatever, generalizing the problem which had been treated earlier 
by Olds [15]. Finally in 1941, Greville [8] gave the exact distribution of hits 
for matchings between two decks of arbitrary composition. He also considered 
the problem from the standpoint of a contingency table, as had been done 
earlier by Stevens. 


294 



ON THE PROBLEM OF MULTIPLE MATCHING 


295 


In 1939 Kullback [14] considered matchings between two sequences obtained 
by drawing at random a single element m turn from each of n urns ?7, containing 
elements of r types E, in the respective proportions p.j. He showed that if 
the process of drawing were indefinitely repeated the distribution of hits would 
be that of a Poisson series. 

The work which has been done thus far applies to the problem of matching 
two decks of cards. In the present paper a method is developed for obtaining 
the moments of the distribution of hits for matchings between three or more 
decks of cards of arbitrary composition. 


2. Matchings between two Decks of cards In the present paper it will be 
convenient to take as the point of departure the method used by Wilks [19] 
in his treatment of the problem of hits occurring under random matching of two 
decks of 25 cards each, namely a target deck with composition (5®) and a match¬ 
ing deck with composition (s,), i = 1, 2, 3, • • • , 5, ^ s, = 25 He showed that 


( 1 ) 4 > 


where, 



{xi ajj -b 


+ (ii + xj e“ -p xs • • • -f 


• • ■ (xi + aij -t- • • + Xi e®)® 



251 

Si!s2l • • • sbI’ 


is a suitable generating function for obtaining the moments of the distribution. 
In fact, if we define an operator K^,, . i, as 


(2) fC.,., . = coefficient of in u, 

where u = uixi, X 2 , • ■ • , x{), and if h denotes the number of hits, then for 
r = 1, 2, • • • , 5, 

(3) P(,h = r) = coefficient of e’’® in 


And it is readily seen that 


(4) 


Em = K.,.,.. 


9-0 


Wilks’ function involves a particular order for the target deck. If we are to 
generalize and obtain moments for matchings between more than two decks, 
it is obvious that we must devise a procedure which will, in the case of two 
decks, be perfectly symmetrical and not require that one deck be given a pre¬ 
ferred status. In the case of two decks this is readily accomplished by the use 
of Kronecker deltas, and in the case of three or more decks by the use of obvious 
generalizations of these deltas. 



296 


I. h. BATTIN 


For two decks of 25 cards each with compositions (5*) we need only let 
(5) tf. = (a:, Vi as ^ V> fi**'*^ 


where 5,( = 1 ; S.j = 0 , i j. 
Then, if 


( 6 ) A'„i,^ coefficient of 
where u - ^( 2 : 1 , Zt, ■ ■ • , Xt, yi, yi, ,yt), it readily follows that 


xV'yV^yV' 


Vi'' inn 


(7) 


E{hr) = ^ 


iC, 




More generally, for two decks of n cards each, the cards being of k types, and 
the decks having compositions (rin, , ■ • • , nu), (n*!, n« , - ■ • , n**;) respec¬ 

tively, we let 

(8) (f, = u" ^ (x, y,e*'^*Y ^(-^1 ^*'^®***') • 


The factors of 0 are in one-to-one correspondence with the n events of dealing 
a card from each of the two decks. The values which can be assumed by the 
subscripts i and j are in one-to-one correspondence with the k types of cards. 
The symbol X{ corresponds to the first deck, yj to the second, the subscripts i 
and j corresponding to the different types of cards in each deck. The expansion 
of consists of all products which can be formed by choosing one and only one 
pair x^y» from each factor of as a factor of the product. In forming any term 
of 1 ^, choosing XaVa from any factor of ^ corresponds to dealing a card of type a 
from both decks, and introduces e* into the coefficient of the term. Choosing 
x^Vfi from any factor corresponds to dealing a card of type a from the first 
deck, ^ from the second, lia 9 ^ p, then, since 5(y = 0 , t 5 ^ j, e’ is not introduced 
into the coefficient. Therefore in the coefficient of any term of if>, e' will be 
raised to a power, say a, which is equal to the number of factors of i> from which 
pairs XaVa have been chosen. 

The total number of ways in which the term 


■ ■ • X*" 

can arise is equal to the number of ways in which two decks of types {uu), {ntj) 
respectively can be dealt, (where (n,,) m (nunu ■ ■ ■ n^) and similarly for (n*/)). 
But this is given by 


Anum,. ..nu.niinjj. ..nj* <#> 1*-<| — AnunD.'.nifXiiriii*. ng) ^23 ^53^1^ 



ON THE PROBLEM OF MULTIPLE MATCHING 


297 


The coefficient of e'® in nifnamjj.. mt<i> is the total number of ways in 

which the term ” • • • can be formed subject to the 

restriction that pairs x,yj with t = j are chosen from s of the factors of But 
this is precisely the number of ways in which the two decks can be dealt so that 
there will be s hits. Hence if, as above, h is the number of hits, the probability 
that h = s, assuming all permutations in each deck to be equally likely, is 
given by 

coefficient of e** in K„. 


( 10 ) 


P{h = s) = 


• 7 * 2 * 


<t> 


0 

Since this is true for all values of s it follows that 


( 11 ) 

Since 


m-) = 


K„, 


"If"*!"?! • ni* 






d<t) 


J l»-o L»-i J 

Tv Vv VVv V~‘ 

-”LS‘'H(S*') (S'") 




we have at once 


E(h) = 


n 


k 


n n ^ 

.niij [_7is,_ 


"ll"!! "1 




( 12 ) 


= r - -i^r n ZF- 

n n n:iL”u! 

_ni.J 


/ k s. n-l 

'Krifinfi ■.nj,_i(nj,_l)7l2,+i 

in - 1)1 _] 

1 ! • - ■ n-ifclj 


ni,_if(ni, — l)lni,+i! 


■_ in - 1)1 

_n«l ••• ni{_il(nj, — l)!n,^^i! 


nui\ 


rtunji 


<-i n 


It is an equally straightforward matter to show that 


li») = £ 


( 13 ) sm 

and that 

(14) A = Z 


+ 


Wi.(ftu — — 1) 

n(H — 1) 


- 1 +z 

J •f'j 


n(n — 1) 


I Hnttt, 


^<i) n\n - 1) ' 


r"L 



298 


I. L, BATTIN 


Evidently any of the nu and tijj may be zero, provided only that X) = 

iml 

k 

= n. The case of two decks with unequal numbers of cards ni and n, 

(m < n), IS readily handled by substituting for the smaller deck one obtained 
by adding n-m ‘‘blank" cards—that is, cards of any type not already appearing 
in either deck, as indicated by Greville [8], who however obtained his results 
by considering a preferred order for one of the decks. 

Example 1. In the case of the decks treated by Wilks [19], n = 25, A: =s 5^ 
til, = n^j = 5, Hence from (12) 


Em = i{||} = 5, 

and from (14) 

i _ V /5'5 _ 25-25 5-4-6-4\ ^ ^ 5-5-5-5 

(^\25' (25)*“^ 25.24 J (25)^24 



Example 2. Suppose we have two decks os shown by the scheme 


Type of card Total of all types 

1 2 3 4 6 

No. in deck A 
No. in deck B 


5 7 8 0 0 20 

0 3 4 6 2 15 


Here deck B has five fewer cards than deck A. Hence we mu.st presume that 
there are six types of cards in all, and that the decks have the respective distribu¬ 
tions (678000) and (034625). We then have at once 

E{h) = = l[o -b 3-7 + 4-8-]- 0 H- 0-b 0] 

i-l 71 

= 2.65 

3 _ V L njV\ , v* nuntiriijnii 

“ w \ 7t n’ n® J n'(n - 1) 

= 2.66 - ± 13“.7* + 4^8’) -b 13-2.7.6 -b 4-3.8-71 

+ —-— 

400.19 


(3-7 4-8 -b 4-8-3-71 



ON THE PROBLEM OF MULTIPLE MATCHING 


299 


3. Matchings between three decks. Let the three decks be of types 

g 

(«nni 2 • ■ ■ nig), {nuni^ ■ ■ • {nziUs^ ■ • • nzg) respectively, with 22 = 

1-1 

9 9 

22 ^ 2 ; = 22 nm = n, and consider the function 

j-i 1^.1 

(15) 0 = r 22 

Li.J.t-l J 

where 

(16) = 1, 6 ,,k = 0 i, J, k not all equal, 

and the other deltas are the usual Kronecker symbols 

Each factor of 4 > corresponds to one deal from each of the three decks. The 
symbols x, y, and z correspond respectively to cards in the first, second, and 
third decks. The subscripts i, j, k, = 1, 2, • ■ ■ , q correspond to the types of 
cards—there being q distinct types. 

Choosing XaVaZa from a factor of 0 corresponds to a deal in which a card of 
type a IS dealt from all three decks, and introduces into the coef¬ 

ficient of the corresponding term in the expansion of 0 Similarly, choosing 
XaVaZ?, 9 ^ a, corresponds to a hit between the first and second decks, and 

introduces e®“ into the coefficient Similarly choosing XaV^Za introduces ; 
xsVaZa introduces e*”. Choosing XaV^Zy, a 9 ^ y 9 ^ a corresponds to a deal 
with no hits, and introduces no powers of e into the coefficient, since all the 5's 
are zero. 

Let Kni, bc defined by 

(17) = coefficient of ••• ■ 2 /^%"” • • • ^ 9 ”” in u. 

Then the coefficient of in \ 9 jj=(?, 3 _((s,=#isthenumber of ways 

in which the cards carl be dealt so as to yield precisely hm triples, or hitsbetweep 
all three decks. Similarly the coefficient of e*'®®” in nj, 14*0 I 
is the number of ways in which the cards can be dealt so as to yield precisely hu 
hits between the first and second decks, with corresponding results for the first 
and third (hiz) and second and third (ha) decks. 

By the same reasoning as before then, we have 


(18) 


(19)- 


And it is a straightforward matter to 




A' ^ 

^nu nil 


Eihli) = 


Knu-nij n3t0 |9'«-0 

d'<t> 

Kni, 




Kni, nj, nat0 Is'.s-O 


with similar results for hn and I 123 ■ 
show that 



300 


I. L. BATTfN 


(20) E(hi^) == 

i“l \hbi»1 71 J 


n(»-l) ± (n^')' 

Vo-l / 


(22) B(hu) — 7ii(7is/n»jb 


(23) E(h») = 4 i 


n" „t-i 


niinifTini 


(24) JS(;i«) = -4 4 


2 ^ T\li'fh]7lij 
TV i.,-1 


i?(^ii) — -j 12 ^ 

tv n’(n ~ 1) 

(26) + Z n(i 


: O ^i>' tht 

L 


*» Tl(* 


12 nii' «»V nj* njr + 12 tii, tin nn nn nit’ 

%,kf^ h, \i*l 


+ 12 TiHnununjjnjfcnjr 

*r*l> Mr J 


with corresponding results for other moments. It is understood each summation 
index takes values from 1 to ? 

As before, if the decks do not all have the same total number of cards it is 
merely necessary to introduce one or more sets of "blank” cards. Thus we 
would replace decks with the compositions (67800), (03462), (00336) by hypo¬ 
thetical decks (6780000), (0346250), (0033509) and proceed as before. 

Example 3. For three decks of 25 cards, consisting of five of each of five 
kinds we have n = 25, n„, = 6, a = 1, 2, 3, f = 1, 2, ■ • • , 6. Hence 

E(h^) = 26 4 n 4 = 1 

W;=) . 25 t (!)■ + 25.24 g (^J + 25.24 ,2 (J^J 


* 47 

48‘ 



ON THE PROBLEM OF MULTIPLE MATCHING 


301 


(25)2 ,2 5 + (25)2(24)3[.2 5 4+ 5‘4“ 

k^T 

+ i 6*4+ E 5*1 


kf^r 


= 29i, 

<^Aij = 4^. 

with similar results for E(hi), E{ha), , and ajj, . 


4. Generalization to any number of decks. If the moments of the distribu¬ 
tion of hits—doubles, triples, quadruples, . . —in matching any number of 
decks is desired, these can be obtained by using an obvious generalization of 
(15). Thus for four decks we would define = 1, S.jiti = 0, i, j, k, I not all 
equal, and use 

(26) <^ = i 

However, it is evident that as the number of decks is increased the summations 
involved and the manipulation of the (generalized) K operators rapidly become 
complicated. 


6. Application of our moment-generating technique to two-way contingency 
tables. The moment-generating technique which we have discussed has wider 
applications than merely to matching problems. As an example of considerable 
interest we shall consider the contingency problem. Consider the array 


(27) 


nap. n„. 
n.p n 


and also the function 


a = 1, 2, ■ • • , r 
^ = 1, 2, • • • , s 


nap Tl«. 

atp a 



n 


(28) 


^ = n (xfl = n ('Lxpe'A". 

a""l «i—l \J3—1 / 


If i and j are particular values of a and |3 respectively, then to the f-th row 
of the array corresponds the product (xpe^'^)"'', consisting of n,. identical 
factors one such factor corresponding to each of the n,'. elements in the 

row. To the j-th column of the array corresponds the re, which appears in each 
of the factors of <i>. To the ij-th. cell of the array corresponds e’*' which appears 
only in the factors (xpe"'^)’'" , and in each of these only as the coefficient of x, . 



302 


I. L. BATTIN 


The expansion of (/> consists of all products which can be formed by taking as 
factors one and only one element (not summed) from each factor of 
But taking a:,/*' from one of the factors (i/se**'’)"" of corresponds exactly to 
putting an element in the ij-th. cell of a lattice such as (27). Hence every term 
in the expansion of 4> corresponds to a particular distribution m such a lattice. 
Moreover, all terms of <l> correspond to distributions in which the row totals 
are n^., for we must take elements from the product . Further, 

those terms in which the Xfi appear in the particular product xi'^xi'’ ■ ■ ■ x"" 
correspond to distributions in which the column totals are n.i, n.i, • • • , n.,, 
since choosing n.j elements corresponds to putting n.j elements in the 

j-th column and some row of the lattice. 

Expanding <j> we obtain 

(29) ♦-■■■+[£ n [^;] 3 ■ ■ ■ I.- + ■ • ■ 


where the summation is over all partitions inaiUat • • Ua.) of the n„. such that 
(ni^njp • • • Tir/i) is also a partition of n.^ . It is clear that since every set of 
values of the subject to the partition restrictions = n„. , ^ = n,^ 

fi a 

corresponds to a particular distribution of n elements in the lattice (27), every 


particular product 



corresponds to such a distribution, and represents 


the number of ways in which it can arise. Further, the total coefficient dis¬ 
played (29), namely II L represents the total number of ways in which 
0-1 

distributions with row totals n„, and column totals n.^ can arise. Setting all the 
= 0 we readily find 


(30) 



n.,(Xi + X3 + •■■ + X,)” 


Hence the probability of any particular distribution 11 11 with fixed row totals 

n„. and fixed column totals n.^ is 


(31) 


P{\\ Tiaff 11 ln„., n.(j) 



Moments of the n^j . Consider now the result of differentiating <t> with respect 
to a particular , say . We obtain 


(32) 


d<i> 

ddi, 




n ^o-1 ? 

a 


Xx x% 


x:-‘ + 



ON THE PROBLEM OF MULTIPLE MATCHING 


303 


■where denotes summation over indices such that ^ = n.^, 

Zj + '”'11 — ^-1 i)- Now n ,3 < min (w,., r.,), but also n,. can never be 

less than n., — {n — n,.). For n., = n,y + ^ . Since the maximum value 
of rjaj < n„. , the maximum value of S «=i < ”■«. - Hence 

«<, — n., — ^ Uai >«■., — X) = n., — (n — w,.). 

Therefore 

max (0, n. j — w + < min (n,. , n.,). 

Accordingly, combinmg all the terms of (32) in which n,, has a particular value, 
7 , we have 

d(/) 

(33) 


+ 


m»n. (ni in.j) 


mo* ( 0 , 


Hi in.j) r “1 

i: y z* n* 


S ^aS^a6 

•e“.^ air-'i?-' ••• x."" + ■•■ . 

where S* denotes summation and n* multiplication with titj = y. 

Since Z* 11* is precisely the number of distributions IlnaflU for 

2 vaf~n.f “ L^bisJ 


which ni 3 = 7 , it follows that 


(34) 


E(n„ I n„., n. 3 ) = 


Similarly it follows that 


(35) 

(36) 


E{n^j I , n.p) = 


E{nf,nl,\na., n.ff) = 


tJ 

[t]^ 

1 . 

r— n-i 
n 


d<t> 


ee., 

.flnP “0 

dP<i> 






' d9f, del, 


flfl/j —0 


where we may have i = k or i ^ k, and j = I or j 9 ^ 1 . 

By straightforward differentiation and reduction we 6 nd that for the array 
(27) with given marginal totals n ^., n.p 


(37) 


E{n„) = 


n,. n., 
n 


-+- 


( 38 ) 


n 



304 


r. Ij. battin 


_2 _ [n“ — niut. + n„) + m. n.,]n,. n., 

<r„„ 

E(K) = + 3^^^^ + —’ 

n”' n 


. nn- 

^ + 6^ + 7”-^ + 


and if z and /c, j and Z are distinct 

(42) S(n.,n*,) =-+ (nVn,. + 

(43) £(n.,n.,) = --+ (<>«., + + - ^ - 

4 4^ n»'nrn)rn)P , «!•!» n*. nf?'n.< 

E(n.,nH)= -- 

(4) (4, 

. m.nl. n.,n\i . n,.n*:.n./n., 
■*■ n'« n«> 

Moments of the dislnhuiion of Chi Square. For the array (27) 


L, - 

-5_Z 

«.? Ba.n.p 


= Zr-~nl,-2n,, + !^^]. 


Hence, using the above results we can, theoretically, find all the moments of the 
exact distribution of x*' It is not difficult to show that 


(46) Eix^) = ^(r - Dis - 1). 

The value of F[(x*)*] and the variance of x’ were found by straightforward 
application of our methods and the results agreed with those given by 
Haldane [10]. 

The writer is indebted to Professor Wilks for helpful criticisms and suggestions. 


references 

[1] M B. Babtlbtt, "Properties of sufficiency and statistical tests,” Proc, Roy, Soo., A, 
Vol. 160 (1937), pp. 298-282. 

[21 Dwight Chapman, "The statistics of the method of correct matchings,” Amer. Jour. 
Psych., Vol. 46 (1934), pp. 287-298 

[3] Dwight Chapman, "The generalized problem of correct matchings,” Annals of Math. 
Slat., Vol. 6 (1935), pp 86-96, 



ON THE PROBLEM OF MULTIPLE MATCHING 


305 


[4] J, A Greenwood, ‘‘Variance of a general matching problem,” Annals oj Math. BiaL, 

Vol. 9 (1938), pp 56-59. 

[5] J. A Greenwood, "Variance of the ESP call Beiies,” Jour, oJ Parafsychologijt Vol 2 

(1938), pp. 60-64 

[6] J A. Greenwood, "The first fom moments of a general matching problem," Annals 

of Eug., Vol 10 (1940), pp. 290-292. 

[7] T. N. E. Greville, "Exact probabilities for the matching hypothcsia,” Jour, of 

Parapsychology, Vol. 2 (1938), pp. 55-59 

[8] T. N. E. Greville, "The frequency distribution of a general matching problem," 

Annals of Math. Slat., Vol. 12 (1941), pp 350-354 

[9] J. B S Haldane, "The mean and variance of Chi square, when used as n test of 

homogeneity, when expectations are small," Biomstnka, Vol 31 (1940) 

[10] J. B S. Haldane, "The first six moments of Chi-squarc for an n-fold table with n 

degrees of freedom when some expectations are small," Biomelnka, Vol 29 
(1939), p. 389 

[11] J. B. S Haldane, "The exact value of the moments of the distribution of chi-squaie, 

used as a test of goodness of fit, when the expectations aie small," Biomelrika, 
Vol. 29 (1939), p. 133. 

[12] E. V. Huntington, "A rating table for card matching experiments," Jour, of Para¬ 

psychology, Vol. 4 (1937), pp 292-294 

[13] E V Huntington, "Exact probabilities in certain card-matching pioblcms," (Science, 

Vol. 86 (1937), pp, 499-500 

[14] Solomon Kullback, "Note on a matching problem,” Annals of Math Slat , Vol 10 

(1939), pp. 77-80. 

[15] E G. Olds, “A moment-generating function whicli is useful in solving certain matching 

problems," Bull Amcr. Math. Soc., Vol. 44 (1938), pp. 407-413. 

[16] T. E. Sterne, "The solution of a problem in probability," Science, Vol 86,(1937) 

pp 500-501, 

[17] W L Stevens, "Distiibution of entries in a contingency table," AnnoZs ci/ Eugenics, 

Vol, 8 (1938), pp. 238-244. 

[18] W. L. Stevens, "Tests of significance for extra sensory perception data," Psycho¬ 

logical Review, Vol. 46 (1938), pp 142-150. 

[19] S. S. Wilks, "Statistical aspects of experiments in telepathy," a lecture delivered to 

the Galois Institute of Mathematics, Long Island University, December 4,1937 



ON THE CHOICE OF THE NUMBER OF CLASS INTERVALS IN THE 
APPLICATION OF THE CHI SQUARE TEST 

By H. B. Mann and A. Wald‘ 

Columbta University 


Introduction. To tost whether a sample has been drawn from a population 
with a specified probability distribution, the range of the variable is divided 
into a number of class intervals and the statistic, 


( 1 ) 


(a, — Npif 
h Npi 


2 


X. 


computed. In (1) k is the number of class intervals, «, the number of observa¬ 
tions in the fth class, p, the probability that an observation falls into the ith 
class (calculated under the hypothesis to be tested). It is known that under 
the null hypothesis (hypothesis to be tested) the statistic (1) has asymptotically 
the chi-square distribution with fc — I degrees of freedom, when each Npi is 
large. To test the null hypothesis the upper tail of the chi-square distribution 
la used as a critical region. 

In the literature only rules of thumb are found as to the choice of the number 
and lengths of the class intervals. It is the purpose of this paper to formulate 
principles for this choice and to determine the number and lengths of the class 
intervals according to those principles. 

If a choice is made as to the number of class intervals it is always possible to 
find alternative hypotheses with class probabilities equal to the class probabilities 
under the null hypothesis. The least upper bound of the '‘distances" of such 
alternative distributions from the null hypothesis distribution can evidently be 
minimized by making the class probabilities under the null hypothesis equal to 
each other. By the distance of two distribution functions we mean the least 
upper bound of the absolute value of the jjifference of the two cumulative 
distribution functions We have therefore based this paper on a procedure b 3 '^ 
which the lengths of the class intervals are determined so that the probability 
of each class under the null hypothesis is equal to 1/A: where k is the number of 
class intervals.* 

Let C(A) be the class of alternative distributions with a distance ^ A from the 
null hypothesis. Let/(V, k. A) be the greatest lower bound of the power of the 
chi-square tc,st with sample size N and number of class intervals k with respect 
to alternatives in C(A). The maximum of f{N, k, A) with respect to A; is a 
function $(1V, A) of N and A, It is moat desirable to maximize f{N, fc, A) for 


‘Research under a grant in aid from the Carnegie Corporation of New York 
“This procedure was first used by H Hotelling. "The consistency and ultimate dis¬ 
tribution of optimum statistics," Trans, Am. Math, iSoc , Vol, 32, pp 861.) It has been 
advocated by p. J, Gumbel in a paper which will appear shortly. 

306 



CLASS intervals IN CHI SQUARE TEST 


309 


We now assume that W is so large that the joint distribution of the is suffi¬ 
ciently well approximated by a multivariate normal distribution. Then 

= 0, Eiz\) = E{z\z]) = E{z\)E{z]) + 2[E{za;)f for i 9 ^ j. 

We have the well known relations 

E(z\) = E{a\) — i\rVi = iVp.(l — p.)> 

E(z,z,) = E(a,aj) — N^ptP, = —Np,p,. 

Using the above equations we obtain 

= 2pi(i ~ p.)^ + 2 [p*pi(i - p.)(i - p ,)+2 p!p 5] -p.(i - Pi) j I 
= 22V* p^(i - p.)" + p* p^j 

= 2iv*[Ep^-2gp; + (gp*)]. 

Further 

E (S ^ (S Wiy)j 

.jv[gp,(l-p.)(p.-^) - SP.!^(f.-j)(p.-0] 
-iv[rp:-(Ej.!r]. 

Substituting this into the formula for <r*'i we finally obtain 

(4) v*-. = 2fc* |e p\ -h 2(W - 1) 2 p! - (22V ~ 1)(2 pljj. 

2. The Taylor expanaion of the power. Let C be determined so that the 
probability under the null hypothesis that ^ (7 is equal to the size Xo of 



310 


H. B. MANN AND A. WALD 


the critical region. Let P 




be the probability under the alternative 


hypothesis that 2 ^ C. Then the power P is given by 

1-1 


(5) 


where 

N 

k 



Hence 


and (5) can be written in the form 

(6) 

p(p.>.C') 


where C' is a certain function of N and k. Let 5i = pi — r > where pi is the 

fc 

probability of the ith class interval under the alternative hypothesis. 

Expanding P into a power series we obtain (in this and the folloiving deriva¬ 
tions, we take all partial differential quotients at the point 5i = 5a = ■ • • = 
S* = 0) 


ia.1 o5| 2 dSx *^7 


5*P 
35,35, 


+ 


Since P is a symmetric function of the 5, we have for 5i = 5a = • ■ = 5* = 0 


d^p _ d'‘p 

d'P 

a'p 

dS\ d5\ ’ 

35,35, 

35i 55a 

Furthermore 5i - 0. Therefore 


^ d^P ■ 


2 t=i 

35i dSi 1 


+ 


for i ^ j. 


We shall first show that the terms of second order are always positive. This 
shows that the test is unbiased and justifies again the choice of equal class 
probabilities under the null hypothesis since this assures unbiasedness and mini- 



CLASS INTERVALS IN CHI SQUARE TEST 


311 


mizes among all unbiased testa the g.l.b. of the distances of such alternatives 
whose power is equal to the size of the critical region. 

The power is given by 


P = 


JVl 


■■ptK 




(7) 


> 

t-1 


Since 22 — “22 we obtain for the second order terms 

t""! 1^1 

|2 p \ ^ 

(ai — Qi — «i 0!2)p(ai, as • ■ ■ a*) 22 

aj+a5+ +ajac' 

where 

p(ai ,...«*)= ^ ^-- . 

0'i!a2l • • • 

In the following derivation extend all sums if not otherwise stated over all 
terms for which 22 aj ^ C' and use the relation 22 We have because 

1-1 ,-i 

of the symmetry 


^ 4- V * * _ 

^ ‘ daias2 ^ ' \dBl asi 

22 

aj+a5+ +ajac' 


N\ 


22 aiP(«i, aa, • ■ • a*) = J 22 P(«i) “2 • • • a*) = ^Xo, 

22 aiaapCm < aa, ‘ ‘' a*) = !_ 22 ~ £ P(“i i “ 2 ; •■■“*) 

iV Xo 1 2 / \ 

~ k{k - 1) 2^ «1 “2, ■■•«*:). 


Hence the coefficient of the second order term becomes 

iV* . 

___ _ _ > ffil yMj • . VUi \ _ It- — _ 

k 


4-i22«?p(ai,a2, ..«*) -fxo“ 


= , ^ ^ 22 £ “* p(«i»“ 2 1 ■' ■ “*) ~ ~ 

fO — 1 »«1 


iV' 


k k(Ji 1) 


Xo • 


But 


i-k 


22 22 P(«i, aa, • ‘ ■ at) 


Xq 


> E 


( 5 “’)' 


t—A: 


since the conditional mean for values of 22 a^ ^ C" must be larger than the 



312 


H. B. MANN AND A. WALD 


mean 


I—t /♦—t \ 

of all values of 2 a* . Since = --p + A^, we obtain 

1-1 V-i / k K 

oil) 


1 


i-i 


jT—i Z) £p(«i 1 «2 

A — 1 i-l 

^ Xa /N^ L - DA _ X / , iV\ 

k ) ^\kik-l)'^k) 

lp«fc 

and hence the coefficient of 22 is larger than 0. 

1-1 

To prove Theorem 1, we will have to determine the alternative distribution 

for which 22 becomes a minimum subject to the condition that the distance 

1-1 

from the null hypothesis should be greater than or equal to a given A. 

Hence we have to find a distribution function Fix) such that j F{z) — x I ^ A 

for at least one value x and 22 = 22 { P* “ r ) = P? — r is a minimum 

1-1 1—1 \ k / 1—1 k 

where p, = minimizing ^ 5^, we may minimize 

l*fc 

22 Pi I since the two expressions differ merely by a constant. There will be two 

tMl 

different solutions for F{x) depending on whether F{x) — x A or Fix) — x < 
— A for at least one value x. Because of symmetry we restrict ourselves to the 
case in which Fix) — x ^ A for at least one value of x. 

Let a be a value for which F(a) — o ^ A and suppose that 

l-l ^ . I 

— 


then 


We prove first 


Fia) ^ a + A, 

« > A - i. 


Proof: Since ■" ■P’(®) ^ 0 we have 

fQ) - Fia) + fQ) - Fia) ^ a + A 



CLASS INTERVALS IN CHI SQUARE TEST 


307 


such values of A for which ^{N, A) is neither too large nor too small and in this 
paper we propose to determine A so that $(iV, A) is equal to 

Hence we introduce the following definitions: 

Definition 1. A 'positive integer k is called best with respect to the number of 
observations N if there exists a A such that fiN, k, A) = ^ and f{N, k', A) < | 
for any positive integer k'. 

Definition 2 A positive integer k is called e-best (0 ^ e ^ 1) with respect to 
the number of observations N if t is the smallest number in the interval [0, 1] for 
which the following condition is fulfilled. There exists a A such thatf{N, k, A) ^ 
^ € and fiN, k', A) < | + c/or any positive integer k'. 

It is obvious that an e-best A: is a beat A: if e = 0. If e is very small an e-best 
k is for all practical purposes equivalent to a best k. 

Since fiN, k, A) is a contmuous function of A it is easy to see that for any 
pair of positive integers k and N theie exists exactly one value e such that k is 
e-best with respect to the number of observations N, Since the value of this e 
is a function of k and N we will denote it by e(k, N ). 

Definition 3. A sequence [fcjv) of positive integers is called best in the limit if 
lim eikif, N) = 0. 


In this paper the following theorem is proved; 

s /2(N — 11“ 

Theorem 1. Let ka = ^ /U ——- - where c is determined so that 


1 r** 

J dx is equal to the size of the critical region (probability of the critical 

region under the null hypothesis) then the sequence {fc^r} is best in the limit. 

5 4 

Furthermore lim fiN, fcy , Ay) = \ for An - --y. 

kn kn 


It is further shown that for N ^ 450, if the 6% level of significance is used, 
and for N ^ 300, if the 1 % level of significance is used, the value of eikn , N) 
is small so that for practical purposes kn can be considered as a best fc. The 
authors are convinced although no rigorous proof has been given that tikn , N) 
is quite small for N ^ 200 and is very likely to be small even for considerably 
lower values of N. 


1. Mean value and standard deviation of the statistic under alternative hy¬ 
potheses. It is well known that every continuous distribution can by a simple 
transformation be transformed into a rectangular distribution with range [0, 1]. 
We may therefore for convenience assume that the hypothesis to be tested is 
that of a rectangular distribution with the range [0,1]. Moreover as mentioned 
earlier we assume that a procedure is chosen by which the class probabilities 
under the null hypothesis are equal to each other. 

The statistic whose mean value and standard deviation is to be determined is 

f - /■ where I, . ^(a,-fj. 



308 


H. B. MANN AND A. WALD 


Let p, be the probability under the alternative hypothesis that one observa¬ 
tion will fall into the fth class. The probability of obtaining certain specified 
values m , aj, • ■ • ) a* IS given by 


f(ai, ai, • • • ait) = 


N\ 


Ofi I ai! • • ■ a* 1 


pf'pr ■ 




l»)l; 

Since ^ c£{ = wc have 


• •"It t*-|; 

• •1 iV 


We consider the function 


(pie‘‘ + pae'' + ■ • • Pite“)'^ = S/(ai , a^, • • • -“‘'L 

Differentiating twice and then setting /, = 0 for i = 1, 2, - • • fc we obtain 

(2) NiN - l)p? + Np, = £(«?), N{N - l)p<p; = £(«.«,) for i j. 
Hence 

E (2 aij = N(N ~ 1) g p] + N, 
and 

(3) E(x'^) = - 1) iC P? 4- fc - iV. 

1*1 

To compute the standard deviation of x'* we put 






CLASS INTEEVALS IN CEI SQUARE TEST 


313 


and 






1 , 
If ^ ^ we can always find a distribution function in CCA) for which p. = 

/c 

1 T—A: 

Hence we consider only the case k > -. We must minimize Yj pl under the 

, , A ,„i 

^ I ^ k — I 

condition Zj Pi = ^ + e, ^ p, = —-e. We therefore minimize 


1-1 


i-i+i 


k 

t-2 




= 2 p* - 2Xi 2 p. - 2 x 2 Y p. ■ 




»-I+l 


This leads to 


Pi = 


5 + 1 

for i = (I + 1), 


k k~l 


We then have 

i-k 




This is smallest if « = A — - and Z = -. The following discontinuous distribu- 

A/ ^ 


tion function gives these values for <, 1 and p, and has the distance A from the 

rectangular distribution. 



nd..[i + 2(A-0] 

for 

0 < X < i - i. 

^ ^ 2 k’ 

TO-i+A-l 

for 

1 1 .1 


^ for 

a; ^ 1, 

F{x) = 0 

for 

0 ^ x, 

F{x) = 1 

for 

a: ^ 1. 


3. Solution for large N. Denote by F(A, k) the distribution function (8) of 
C{A) which makes 2 S' minimum if the test is made with k class intervals. 

1-1 

Assume that k is large enough that x'^ can be taken as normally distributed. 
The power of the test is then given by 



314 


H. B. MANN AND A. WALD 


(9) 


V2 


L-,f 

irr Jo 


(l-l) 


'0) ,(f 

\l“"l / 


dy, 


where c is the standard deviation of ^iid c is determined so that 

1 r* 

dy is equal to the size of the Cl itical region. Hence to maximize 
the power with respect to k is equivalent to maximizing 


m = 


E (£ - ik-l) -c V2ik - 1) 


with respect to k. 

Under the alternative F( A, k) we obtain 

B(Ex'^-(k-i) = k{N-l)''Zp\ + k~N - k+I 1)(a - 

Hence 


^kik) = 4{N - 1) 


(-0‘ 


cV2(k- 1) 


We choose A so that this maximum power is exactly that is, so that f(k) = 0 
for that fc which maximizes Denote this value of A by Am and let ky be 

the value of k which maximizes <p{k). The differential-quotient of the nume¬ 
rator of \l/{k) with respect to k is then equal to 0 for fc = A:m . Hence 


( 10 ) 



JLM. = c 
kJ k^ V'2(fcw - 1) ■ 


Furthermore since ^kiky) = 0 we have 

(11) 4(iV - 1) ^Am - j-J = cV2{kr ^). 

Solving equations (10) and (11) we obtain 


( 12 ) 


Am = 



Ky 



- 4 


-1)» 


and 



CLASS INTERVALS IN CHI SQUARE TEST 


315 


or since kn > 3, 




Hence 

(13) either k. 




or 




is the value of k for -which the power with respect to F(A!f, k) becomes a maxi¬ 
mum. We have merely to show that 4>"(k) is negative for k = kr, ■ 

Using the fact that 'PQcn) = '/''(fcw) = 0 we obtain 


k% {'\/2{kti — 1 ))* 

Substituting for Ay the right hand side of (12) we obtain on account of (10) 
)_ -56(W - 1) , 64(W - 1) ^ 8(W- 1) /4 4\ 

Using 2(k — 1) > A we obtain 

r(ky) < ^-24(W - 1) -t- ^ (iV - 1)) 

which is negative. <r' can be shown to be of order Ay ; ^"(Ay) is, therefore, of 

The maximum is, therefore, rather flat for large 


order 


I- = 


values of N. 

We shall now show that if A is large enough to assume x'^ to be normally 
distributed then F{A, A) is the alternative which gives the smallest power com¬ 
pared with all alternatives in the class C(A) provided the power for the alter¬ 
native F{A, A) equals 

We know that E is smallest for F(A, A) Since the power -with respect 

to F{A, A) equals j we have 

E (E - (A - 1 - cV2{k - 1) = 0 

Thus the lower limit of the integral in (9) becomes negative for every other 
alternative and the power will be larger than 

The power with respect to F{An , Ay) is equal to hence if we choose A = An 
the power of the test will be ^ ^ for all alternatives in the class C(Ay) On the 
other hand if we choose A 5 ^ Ay then there will be at least one alternative in 


• Gantelli’s formula and its proof are given by Fr^ohet in his book Becherches Thiortques 
Modernea aur la Th&orte de ProbabthUa, Pans (1937), pp. 123-126 



316 


H. H. MANN AND A. WADD 


C(A/j) for which the power is <i. (For instance F{An , k) is such an alter¬ 
native.) 

The above statements have been derived under the assumption that is 
normally di.strihuted Hence if the distribution of x'* were exactly ncrmal 

k/f = i would be a best k and foi this and A^ = —— the 

y c- kt, k% 

greatest lower bound of the power in the class C'(A^) would be, exactly Since 

the distribution of x'^ approache.s the normal di.stribution with k ~y the 

.sequence |A:yl is best in the limit and Theorem 1 stated in the introduction 

is proved. 

For the purposes of practical application.s, it is not enough to know that 
{fcy} is best in the limit. We have to know for what values of N ktr can be 
considered practically as a best k, i.e. for what value.s of N the quantity f(,kN , N) 
defined m the introduction is sufficiently small. The quantity e(fcy , N) is cer¬ 
tainly small if for the number of class intervals fcy the distribution of x'^ is near 
to normal and if the power with respect to at least one alternative of the class 
C(Ajvf) is smaller than i also in the case when the number of class intervals is too 
small to assume a normal distribution for x**- 

We shall in the following assume that for A: > 13 the normal distribution is a 
sufficiently good approximation. Actually we need not assume a normal distri¬ 
bution but only that the probability is close to i that the statistic will exceed 
its mean value. 

Cantelli’ gave the following formula. Let Mr be the rth moment of a distri¬ 
bution about *0 ■ Let d be any arbitrary positive number. Let F(| x — Xo | < d) 
be the probability that j x — xo | < d then the following inequalities hold: 


If 


If 


d' d’-- 


Mr > 


Mtr 


then 


then 


Mr 


P(|x-Xo| ^ d) ^ 1 - 

p(|x-xoKd)^i - — . 

(d — Mr) + Mir — Mr 


Since x'* can only take positive values we have 

(14) If 4 then Fix'" ^ c*) ^ 1 - . 

Cic Ct Cif 


cl 


If S 


(15) 


Ck 


cl 


then P(x'’ < c*) ^ 1 - 




(c* - 

Where c* is determined so that P(x ^ c*) equals the size of the critical region 
if the null hypothesis is true and the number of class intervals equals k. c* can 
be obtained from a table of the chi-square distribution. 

For P(Ay , A)) we obtain with Ay = M — 1 from (3) and (4) 

flJ.V fC 




CLASS INTERVALS IN CHI SQUARE TEST 


317 


E{/) = m - i ) a ;, 

4^2 = 2{k - 1) + 8Ai'(fc + 2Ar - 4) - 32(2i\r - 1 )a;. 

By numerically calculating E{x') and for N = 450 and a 5% level of sig¬ 
nificance, for iV = 300 and a 1% level of significance, and fori; = 13,12 ■ ■ ■ 

-f 1 it can be shown that for these values of N and k 


1 

LA.J 


(16) 


E(/) . 4 - + 1 %'*)]' 

y 


C; 


A 


Hence we have to use (15). From (16) it follows that ci > F^(x'''). If 
^(x'^ < Ca < i we obtain on account of (15) and (16) 






1 




tfx'j + ^ Ck, 


Numerical calculation shows that for the values of N and k and the significance 
levels considered 

(17) (Tx'J + Eix!"^) < Cl . 

It can then be shown that for N ^ 450 and N ^ 300 respectively NAi decreases 
with N. A simple argument then shows that (16) and (17) are also true for 
all values N ^ 450 and N ^ 300 respectively. Hence the power with respect 
to F(Ajv, fc) is < ^ for these values of N. Thus we see • For jV ^ 450 if the 6% 
level is Used, and for N ^ 300 if the 1% level is used, the value k^ = 


4 ; 


'2{N - 1 )' 


can be considered for practical purposes as a best k The value 


1 r* 

is determined so that I dt is equal to the size of the critical region. 
V2?r ‘'c 



LIMITED TYPE OF PRIMARY PROBABILITY DISTRIBUTION 
APPLIED TO ANNUAL MAXIMUM FLOOD FLOWS 

By Bradford F. Kimball 
Port Washinglon, N. Y. 

1. Theoretical statement of problem. There la no doubt that Gumbel’s 
recent paper “The Return Period of Flood Flows’’ [1] has supplied an admirably 
simple technique for engineers to use in approximating the trend of return periods 
of annual maximum flood flows for purposes of extrapolation. This treatment 
IS scientifically of great interest because it introduces for the first time into a 
subject already treated at considerable length by engineers, the theory of the 
probability distribution of maximum values as developed by Fisher and Tip¬ 
pett, von Mises, and otheis.' However, certain further observation.s should be 
made concerning the approach used by Gumbel. 

Let X represent the measure of daily stream flow having a probability distii- 
bution w(z). Let the probability distribution of the associated annual maximum 
stream flows be denoted by Ffx) with 

(1) W(x) = r V(s) ds, 

Jo 

denoting probability that annual maxima be less than or equal to z. The 
return period T(x) of an annual maximum flow of measure x is then defined by 

(2) T(x) = • 

In this paper the probability distribution w{x) will be called the primary 
probability distribution associated with the probability distribution of maximum 
values F(x) and its cumulative distribution lF(x). 

Gumbel argues that lor the type of primary probability distribution that 
might reasonably be expected to apply, lF(i) will be of the type introduced by 
R. A. Fisher' 

(3) W(x) = exp [—exp — a{x — u)]. 

It is further implied that a primary probability distribution involving an upper 
limit would lead to a probability distribution of maximum values of the type 



for which moments of order k or higher do not exist. The inference is then 
drawn that a primary probability distribution leading to such a cumulative 
distribution of maximum values would seem to be less likely to be the correct 

^ See references a,t end of Gumbel’s paper, loc. cit. 

318 



PRIMARY PROBABILITY DISTRIBUTION 


319 


one than one leading to the distribution (3). To this argument we do not 
object; but we question the implied conclusion that hence the xise of a limited 
type of primary distribution is to be disallowed. 

If the primary probability distribution be of the limited Galton type 

(5) w(x) = K exp ( — 

where X is a constant and 

(.6) u = fc[b — log (a — i)], 0 g a: ^ a, 

it can be shown, that the limiting form of the cumulative distribution of maxima 
of n values takes the same type form (3) where x is replaced by u. This can be 
seen by observing that the transformed variate u becomes infinite as x approaches 
a, and hence has infinite range to the right, which places (5) in the category of 
distributions which are known to lead to cumulative distribution of maximum 
values of form (3). More explicitly, considering w{x) as a finite distribution in 
X, if one traces the reasoning as set forth in von Mises’ derivation [2] of the limit¬ 
ing distribution (3), one finds that since the cumulative primary probability 

/ w{s) ds does not have a non-vanishing derivative of finite order at x = a, 
Jo 

that what von Mises terms the case of a limited distribution does not apply, while 
the argument for a cumulative distribution of maxima of form (3) does carry 
through, in spite of the fact that x has limited range to the right. This fact 
was not mentioned by Gumbel 

One is thus led to the conclusion that there is no logical exclusion of the 
assumption of a primary probability distribution of the foim (5). 

One might well argue for a first approximation of the actual primary proba¬ 
bility distribution of stream flows—^using any regular time interval such as a 
day or an hour—of the form (5). Differentiating u with respect to x, one 
obtains 

(7) k dx = {a — x) du, 

which means that to a constant probability increment A u there corresponds a 
maximum increment A x in measure of stream flow equal to {a/k) Au when x 
is at the lower limit zero. This corresponding increment in stream flow decreases 
linearly to zero as x approaches its upper bound a, imposed because of the 
existence of a finite watershed. 

2. Technique of fitting probability distribution of maximum values in case 
primary probability distribution is of the limited type (6)-(6). Write the cumu¬ 
lative maximum distribution (3) in the form ■ 

W{x) = exp (—exp —y), y = a{u{x) — ui), 

u{x) = k[h — log (a — x)], 0 ^ x ^ o. 


(8) 



320 


BRADFORD F. KIMBALL 


Now it is known that for the distribution 


(9) 


dW = e 


dy, 


the mean value and standard deviation of y are given by 
y = .577215 (Euler’s constant C) 


( 10 ) 


Ay) 


Hence 

f — a[iZ(a:) — Mi] = ak[Q> — Ui/k) — I] = C 

where L denotes the mean value of log (a ~ x), with x representing the observed 
maximum flood flows. Also 

a(y} = ak trCL) — r/s/E 


where <r(Z/) denotes the standard deviation of log (a — x). Hence 
(11) ak = (ir/\/6)/<r(Z/), b — Ui/k = C/afc + X, 


and y is determined as a function of i by the relation 
(12) y = aklQ) — Ui/k) — log (o — x)). 

It is interesting to observe that it has not been necessary to determine the 
constants k and b of the primary probability distribution. Only the upper 
bound a and observed flood flows are used in this process. From the relation 
(12) the theoretical curve in terms of x may easily be computed from tables 
relating y to IF (See Gumbel, loc. cit., Table II, page 173). 

The diflficulty of determining what the upper bound o should be in a specific 
case is a practical one and does not concern the objective theoretical problem 
of choosing the type of curve which most nearly describes the behavior of annual 
maximum flood flows. The point to be made in this paper is that the use of 
what seems to be a reasonable value of a, will materially alter forecasts of future 
annual flood flows relative to forecasts made on the assumption that such an 
upper limit may be neglected. It is also ventured that the resulting theoretical 
probability distribution of maxima will in general give a better "fit to the series 
of observed floods than one based on the latter premise. Techniques for de¬ 
termination of upper bound a will not be discussed in this paper. 


3. Examples. In order to demonstrate the point in question the two methods 
have been applied to a 67 year record of the annual flood flows of the Tennessee 
River at Chattanooga for the years 1876 to 1931.’ 


* The author has already used this series in a previous article [3] and for this reason has 
found it convenient to use it here. 



PRIMARY PROBABILITY DISTRIBUTION 


321 


TABLE I 

Series of observed annual flood flows 
(Tennessee River at Chattanooga, 1875-1931) 


(1) 

Observed Flood 

X 

(2) 

Ratio to 
Mean 

(3) 

Per cent of 
Time 

(4) 

Return Period, 
T(x) 

85.9 

.412 

0.88 

1.007 

108 

.518 

2.63 

1.027 

123 

.590 

4.39 

1.043 

310 

1.487 

95.61 

22.8 

349 

1.674 

97.37 

38.0 

361 

1.731 

99.12 

114. 


In Table I, col. (1) is shown the incomplete series of observed annual floods in 
units of 1,000 c.f.a. arranged in order of magnitude. The complete series may 
be referred to in Water-Supply Paper 771 entitled “Floods in the United States,” 
XJ. S. Geological Survey, 1936, p. 401 The mean annual maximum flood of this 
series is 208.56. The ratio of each annual maximum to the mean is shown in 
Col. (2). In Col. (4) is shown the observed return period which is taken here 
as the harmonic mean between what has been called the exceedance interval and 
the recurrence interval (see Gumbel, loc. cit., Table I, p. 167). Thinking of the 
57 year record as a span of 57 years, the above procedure is equivalent to taking 
the observed probability lF(a:) that a given annual flood will not be exceeded 
as the mid-point of the part of this time-span covered by the observed flood in 
question. Thus the lowest flood-peak 85,900 c.f s. corresponds to the span 
from zero to 1.754 per cent of the whole time-span, and hence W{x) is taken at 
the mid-point, —0.877 per cent. Similarly the greatest flood, 361,000 c.f.s. 
corresponds to interval from 98,246 to 100 per cent and is taken at 99.12 per 
cent. These arithmetic means correspond to harmonic means of the “recur¬ 
rence” and “exceedance” intervals referred to above. This is the procedure 
which Hazen [4] originally followed. 

Data from Cols. (1) and (4) of this table determined position of dots on Fig. 1. 
Data from Cols (2) and (3) gave the points indicated by dots on Fig. 2, with 
1 — W(,x) recorded on the chart rather than W{x). 

The two theoretical distributions fitted to these annual flood maxima will be 
referred to as distributions A and B. 

Distribution A. In this case the limited type of primary probability distri¬ 
bution (6) — (6) is assumed From previous studies of this data series made by 
the author [3], an hpper limit of annual floods of some 609,000 c f s. was found 
to be reasonable, and for purposes of this example the same upper limit will be 
assumed for the primary probability distribution. Thus the transformation 
(6) becomes: 


■u = A;[6 — log (609 — i)], 



322 


BRADFORD F. KIMBALD 



Fio. 1. Comparison of methods of fitting annual flood peaks, (Tennessoe River at 
Chattanooga, 1876-1031)—return periods plotted against annual flood discharges on 
semi-logarithmic chart. 

where the logarithm to base 10 can be used without loss of generality since the 
constant k will absorb the conversion factor. The mean value of the logarithm, 
and its standard deviation come to 

1 = 2 59772, ff(L) = .06576 
The constants of the transformation (12) are thus determined by 

ak = (7r/\/6)/(.06576), b - Ui/k = C/(ak) + 2.59772 




PRIMAEY PROBABILITY DISTRIBUTION 


323 


Thus 

l/{ak) = .05127, b - u^/k = 2.6273 

and solving (12) for log (609 — x), 

(13) log (609 - a;) = 2.6273 - (.05127) y 

U.sing a table for the known relations between j/, W(x), and T{x) for the Fisher- 
Tippett distribution of maxiipum values similar to Table II of Gumbel’s article 



Fia. 2. Comparison of methods of fitting annual flood peaks, (Tennessee Eiver at 
Chattanooga, 1875-1931)—Data plotted on logarithmic probability chart designed by 
Hazen, Whipple and Fuller 

(loc. cit.) the corresponding values x of the annual floods are easily determined. 
Thus a theoretical relation between x and W{x) is set up. This is indicated as 
Curve A on the two charts exhibited here. 

Distribution B. The primary probability distribution in this case is taken 
as unlimited to the right, and in general is assumed to have the character of an 
exponentially decreasing function of the measure of stream flow x (see Gumbel, 
loc. cit.). The parameter y of the distribution of annual maxima is given 
directly by 


y = a(x ~ xi) 










324 


BHADFORB F. KIMBALL 


and 

1/a = (\/6 /f) (stand, dev. of annual floods) = (.77970) (58.26) = 45.425 
Ki = (mean annual flood) — Cja — 208.6 — (.57722) (45.425) = 182.4 
Hence 

(14) a: = 182.4 - (45.426) y 

and using the table of corresponding values of y, 17(3:) and ^(a;) for the Fisher- 
Tippett distribution referred to above, a theoretical relation between x and 
W(x) is easily set up. This is plotted as Curve B on the accompanying charts. 

4. Discussion of examples. In Fig. 1 it is to be noted that if theoretical 
curves are continued to the right to give readings for a return period of 1,000 
years, the divergence of Curve A from Curve B is large enough to be of sig¬ 
nificance, numerically. Visual inspection does not indicate which curve is the 
better fit to the observation points. 

In Fig. 2 the curves are plotted on "logarithmic probability” graph paper. 
This paper was designed by Hazen and Fuller [4] specifically for the purpose of 
plotting annual maxima of stream-flows. A significant divergence in trend is 
to be noted at the right hand end. 

These charts indicate that the use of an upper limit may materially affect 
extrapolation of fitted theoretical curves, for purposes of estimating floods with 
a return period, say of 1,000 years. 

If the trends of observed floods in Gurabel's recent paper in the Tranaactiona 
of the American Geophysical Union [5] are examined, it will be observed that 
in the case of the Connecticut, Mississippi and Rhone rivers, there is a decided 
tendency for the curve of observed floods to turn downwards, away from the 
theoretical curves, which correspond to Curve B exhibited in Figure 1. In 
the case of the Tennessee, Cumberland and Columbia rivers the tendency is 
not decisive, while in the case of the Rhine river at Basel (Switzerland) the 
tendency of the observed curve is upwards rather than downwards. As the 
writer has observed elsewhere [6], this last data series seems to be rather unique 
in character and is possibly the result of a watershed greatly influenced by 
all year around snow deposits. Possibly a radically different primary prob¬ 
ability distribution should be used in this case. 

6. Conclusion. The writer has demonstrated in this paper that in fitting a 
theoretical probability distribution of maximum values to annual maxima of 
stream flows, the use of an upper bound for measures of stream flow by assump¬ 
tion of a primary probability distribution of the type (5)-(6) 

(1) is not inconsistent with the use of the Fisher-Tippett distribution of 
maxima, 

(2) has a reasonable logical basis from the point of view of the hydrologist, 



PRIMARY PROBABILITY DISTRIBUTION 


325 


(3) may materially affect the estimation of return periods when extrapolation 
is involved, relative to results obtained when no upper bound is assumed. 

It has not been within the scope of this paper to discuss techniques for de¬ 
termining such an upper bound, nor to apply the theory to enough data series 
to draw conclusions concerning goodness of fit. 

REFERENCES 

[1] E. J. Gumbel, “The return period of flood flows," Armais of Math. Slat., Vol. 12 

(1941), pp. 163-190 

[2] Richard VON Miseb, “La distribution de la plus grande de n valeurs,” Revue Malhe- 

maiique de I’Vnion Inlerbalkantqm, Vol 1, Athens (1936) 

[3] Bradford F. Kimball, "Probability-distribution-curve for flood-control studies,” 

Trans Am Geographical Union, 1938, pp. 466-475 

[4] Allen Hazen, Flood-flows, John Wiley & Sons, New York, 1930. 

[5] E J. Gumbel, “Probability-interpretation of the observed return-periods of floods," 

Trans Am. Geophysical Union, 1941, pp 836-850. 

[6] Bradford F Kimball, Discussion of paper entitled “Statistical control curves for 

flood discharges" by E. J Gumbel, Prana Am. Geophysical Union, 1^2 



LINEAR RESTRICTIONS ON CHI-SQUARE 


By Franklin E. Sattebthwaite 
University of lotua 


Chi-square is a statistic widely used in statistical analysis, 
the form, 


2-.1 X . 


( 1 ) 



It is usually of 


where the Kj’s are independent normally distributed variables drawn from popu¬ 
lations with respective means and standard deviations, m, and tn . In practical 
problems the independence of the i.'s is often modified by placing restrictions 
on the x»’8 in order to estimate the mt’s or o-,’b. It is well known that if m such 
restrictions which are Imear and homogeneous (also algebraically independent) 
are placed on the Xt’s, then the resulting chi-square, (1), is distributed according 
to the chi-square distribution with n — m degrees of freedom. The purpose of 
this paper is to study the case where the restrictions are not necessarily 
homogeneous. 


1. Geometrical development. The xi’s of equation (1) may be considered 
as co-ordinates in an n-dimensional space. Equation (1) represents a sphere in 
such a space with its center at the origin and with radius, x. We should like 
to determine the distribution of x*- First, since the xr’s are independent, we 
may form their joint distribution,‘ 

F(xi ,Xi, " • Xn)dV = Xlljc"**' dxj 

(2) =A:6-‘^*Jdxidx» ••'dxn 

= dV. 

We may change the variable in (2) to x if we can determine dV. Since the 
n-dimensional sphere represented by equation (1) has a volume proportional 
to x", we may write 

dV = ifd(x')*" 

Substituting this value in the distribution (2) we obtain for the distribution of 
chi-square, 

F(x*)dx’-ii:(x’)^""V*’'’dx’, 

which is the usual form of the chi-square distribution for n degrees of freedom. 

‘ The letter K will be used throughout as a constant, not necessarily' the same constant 
from equation to equation. 


328 



LINEAR RESTRICTIONS ON CHl-SQUAHE 


327 


We shall next restrict the values of xj hy means of a condition, 

(3) OllXl + O 12 XI + • ■ • OinXn = Pi, = 1, 

where pi is a constant. This restriction represents a hyper-plane in our n-dimen- 
sional space at a distance pi from the origin. The intersection of this hyper- 
plane with our sphere (1) is an (n — l)-dimensional sphere with radius 

x' = (x* - p5)‘. 

The differential of the volume of this sphere is 

dV = K(x^ - dx“. 

Substituting this in the distribution (2) we obtam the distribution of chi-square 
subject to the single linear restriction, (3). Thus 

F(x^) dx' = K(x^ - 

or more conveniently, 

F(x^ - P?) d(x^ - p?) = K(x‘ - d(x‘ - p?) 

The argument may be readily extended to include additional linear restric¬ 
tions of the form, 

OnXl + ®22X2 + • ’ • + OSnXn = P2 , SOjy = 1, 

(4) .. . 

OmlXl + + • • • + OmnXn = Pm , = 1. 

For convenience we shall assume that the restrictions form an orthogonal set® 
so that 

- 0, 1 7^ fc. 

The hyper-plane represented by equation (4) is at a distance, P 2 , from the origin. 
Since (4) is orthogonal to (3), it is also at a distance, pa, from the center of the 
(n — l)-dimensional sphere obtained on applying the first restriction. There¬ 
fore the intersection of this hyper-plane with the (n — l)-dimensional sphere 
will give an (n — 2)-dimensional sphere of radius 

// / 2 2 2\1 
X = (x — Pi — 92) ■ 

Similarly, if we consider all m restrictions, we obtain an (n — m)-dimensional 
sphere with radius 

x'-> - ix' - sp;)‘. 


’ Any set of linear restrictions which are algebraically independent and consistent may 
be replaced by an orthogonal set Thus if (4) were not orthogonal to (3), we could replace 
(4) by (4) — fc(3) where k is determined by the condition 

Zoijfai/ — kai,) = 0 

2ai,ai, = k'Sa\f 


or 



328 


FRANKLIN E. SATTBRTHWAITE 


The differential of the volume of this sphere will be 

dV = XCx - - Sp’/). 

Substituting this in (2) we see that 

(x'-V-x-E p! 

is distributed as is chi-square with n — m degrees of freedom. 


2. Alternate analytic development. It is perhaps desirable that we present 
an analytic proof of the foregoing theorem. Therefore we shall first regard the 
p/s as variables and shall determine the joint distribution of x° S'nd the p,’s. 
We may then pass to the distribution of those values of x* which correspond to 
assigned values of the p/s. Note that the x/s are considered to be statistically 
independent. 

The characteristic function of the joint distribution of x* and the p/s is 
known to be* 

g~0/!U-S(l) 

where 

Q == 2 (HkOiltUtj 

= 2 » since 2^ d/* = Siy ■ 

Applying the Fourier transform, we obtain the joint distribution of x* and 
the p/s: 

r(x’, K /■■■ / di. 

where 


Q' = - itx^ _ - {2;(y2{l - 2f/)) 


,, 2 nti + fpyd - 2t0]* 

^ 2(1 - 2it) 


- ~ 2f02p/. 


Performing the integration with respect to 4 we have, 

r 

F « ife-**'’? J 

and finally, 


F =* iC(x’ - 


' See A. T. Craig, “A certain mean value problem in statiatioa,” Bull. Amer. Math. Soo., 
Vol. 42 (1936). p. 671. 



LIKEAR RESTRICTIONS ON CHI-SQUARE 


329 


la our problem we want the distribution of % (or more conveniently, of x ~ ^Ih) 
when the p,‘s take on fixed values. To obtain this we substitute fixed values, 
p/s, into the joint distribution and divide by the marginal total, 

j Fix', PI. • • • Pn,) = iCrlKn - 
This gives us the distribution function, 

Fix - Sp) = 2r[^-7n)] 

which is a chi-square distribution with n — m degrees of freedom. 

3. Application. As an example of the use of linear restrictions on chi-square 
we shall now examine the effect on the chi-square test of goodness of fit if the 
moments of a sample are not corrected for grouping errors in fitting a frequency 
curve. 

The parameters of the fitted frequency distribution, fix), are determined from 
the equations, 

(5) N J x'‘fix) dx = "Z x)6,, fc = 0, 1,2, • • , 

where is the mid-point of the group and 6j the corresponding observed 
frequency. Next a set of expected frequencies, 

= / N/(a;) dx, a, = (x,_i + x,)/2, 

Ja, 

is determined by taking partial areas of the fitted frequency distribution. The 
expected frequency is used to transform the actual frequency into a statistic 
with mean zero and umt variance by the equation, 

xi = («, - ^,)/^*. 

Equations (5) may now be rearranged into the form of linear restrictions on the 
Xj. Thus 

(6) S x’l^lxi = pl 

where the pi have the values, 

pI = Z - 2 x)&i 
= N j x^fix) dx - Z x)^, 


0 in general 



330 


FRANKLIN B. SATl'ERTHWAITE 


To make our example more f^pccific, let iis fit a normal distribution to a sample 
of 1000 items with mean zero and unit variance. Let the grouping be about 
the midpoints, 

Xj ; —3, —2, —1, 0, ], 2, 3. 

The expected frecjuencies in each group are 

(), 01, 242, 382, 242, 01, 6. 

The variance of these expected frequencies is 1.080 as contrasted with 1.000 
for the sample. The linear restrictions, (6), now take the forms, 

(7) 2.4x_,+ 7.8X-S + 15.6x_i + 19.5x0 + 15.6x1 + 7.8xi+ 2.4xs = 0 

(8) -7.2x-,-15.6x_, ~15.6x_i-b0 + 15.6xj + I5.6xa + 7.2x3 - 0 

(9) 21.6x_s + 31.2X.1 + 15.6x~i + 0 + 15.6xi -b 31.2xj + 21.6x» = -80. 

Because of the symmetry of the nonnal distribution, restriction (8) is orthogonal 
to (7) and (9). Therefore tlie only orthogonalization necessary is to replace 

(9) by an equivalent restriction which is ortlrogonal to (7). This can be done 
by subtracting 1.080 timcvS (7) from (9) which gives 

(10) 19.0X-3 + 22.8X-J - 1.2x_i - 21.1xo - 1.2xi + 22,8xi + 19.0x3 =* -80 

If the.se restrictions arc each divided by the square root of the .sum of the squares 
of the coefficients of the xj > they will be the normal orthogonal sot required 
by the development. The dustance.s of these restrictive planes from the center 
of x^'Sphere are 


Pu) — 0, p(g) = 0, p(io) = 1.7. 

Thus if we test the goodness of fit of the normal distribution to this sample by 
calculating chi-square, 


...2 _ V * 

X = £ X, 


(0, 




we should subtract from x* a correction of 


S p| = 2.8 

before judging the significance. This correction adjusts for the effect of the 
grouping error on the chi-squaro test. 

In this example, chi-square has four degrees of freedom so that an error of 
2.8 is large enough to affect our judgment of its significance. It can be shown 
that the correction is proportional to the size of the sample. Therefore, if our 
sample had contained only 100 items, the fit obtained by ignoring grouping 
effects would be almost aa good as the fit when the sample moments were cor¬ 
rected for grouping. On the other hand, if the sample had 10,000 items, it 



LINEAR RESTRICTIONS ON CHI-SQUARE 


331 


would be practically impossible to obtain a satisfactory fit without correcting 
for grouping errors. 


4. Conclusion. The theory of the los.s of degrees of freedom for chi-square 
when the underlying statistics are subject to linear restrictions does not require 
the restrictions to be homogeneous. For restrictions which are not homogeneous, 
a correction must be subtracted from chi-square equal to the square of the 
distance from the center of the sphere, 


X = 2 x; = 0 


to the intersection of the restrictive planes. Non-homogeneous restrictions 
sometimes arise m practice because of the bias introduced by an approximation. 
An example is given from curve fitting. 



SYSTEMS OF TIWEAR EQUATIONS WITH COEFFICIENTS SUBJECT 

TO ERROR 

By A. T. LoNSETii 
Iowa State College 

1, Introduction. Various scientific problems lead to non-homogeneous sys¬ 
tems of n linear equations in n unknowns, in which the n* 4- n coefficients (in¬ 
cluding “absolute” terras) are subject to error. Such errors may be errors of 
observation, or errors introduced by rounding off decimal expanaions. If the 
system has a non-vanishing determinant, the ordinary rules yield the solution. 
But the question arises: how may the possible errors in the coefficients affect the 
solutions? In particular, one would like to know how to exclude the fatal event 
that some malicious combination of errors might make the determinant zero. 
One would further like to have limitations on the solution-errors in terras of 
maximum coefficient-errors. Considering the coefficient-errors as random vari¬ 
ables, one may also inquire as to the probability distributions of the solution- 
errors. 

The principal result obtained in this paper is the Taylor's expansion of the 
error in any unknown, considered as a function of the n(n -f 1) errors in the 
coefficients. An upper bound is obtained for each term of this series, and the 
sum of these upper bounds (when convergent) is expressed in closed form. Thus 
are obtained not only approximations to the maximum error, but an actual upper 
limit. Convergence of the power series is established for sufficiently small 
coefficient-errors; “sufficient smallness” is specified in terms of a simple criterion, 
which simultaneously provides a sufficient condition for the non-vanishing of a 
determinant with elements subject to error. 

These results were obtained before I learned that work had already been done 
on the problem. The earliest seems to be that of F. R. Moulton [2] in 1913; he 
found the first order approximation (6) for n = 3, and discussed the geometrical 
reasons for sensitivity. Much later I. M. H. Etherington [1], evidently un¬ 
aware of Moulton's paper, found the expression for the total error of a deter¬ 
minant whose elements may be in error, and applied this to the present problem. 
He thus found limits for the first and second order errors, in a rather different 
form from mine. The probabilistic considerations of section 6 were suggested 
by Etherington's article. L. B. Tuckerman [3] recently discussed the question 
of estimating computational errors incurred in the course of solution. He con¬ 
sidered only errors of first order. 

My original procedure was to compute the terms of the Taylor's series as 
successive differentials of the unknown, from Cramer's formula. This soon be¬ 
comes laborious, and I found only the first two terms. The linear matrix equa¬ 
tion (4) was then kindly suggested to me by R. Oldenburger. Here (4) is solved 
by iteration, resulting in a simple recursion formula for successive terms of the 
Taylor's series. 


332 



SYSTEMS OF LINBAB EQUATIONS 


333 


2. Formal matrix solution. Let the system of equations be 

n 

(l) OitjXf = c, i = 1 , 2 , ■ ■ ■, n. 

In terms of the matrices 


an • • 

• Ojn 

, X = 

.V. 

Xi 

, c = 

Cl 

Onl • 

^nn 






system ( 1 ) can be written 

(2) AX = C. 

Supposing that not all c’s vanish, and that A, the determinant of A, does not 
vanish, there is a unique solution X. But the a’s and c's, and consequently the 
x’s, ate subject to error: let the true value of a,, be a.^ + a./; of Ci, Cj + 7 .; 
and of the resulting ajj, xj + f,. We must actually deal with the system 

(3) (A + a)(X + i) = C + c, 
where we have written 


ail • 

■ • Otin 

, X = 

'4i' 

, c = 

7i 

, ani •' 

Ctnn , 


, f". 


, 'Yn 


Expanding (3) and using (2), we find for the error-matrix x 

(4) I = m -|- nX -H nx, 

with m = A”^c, n = — A“^a; A”^ is the inverse of A. We solve (4) formally 
for X by iteration. Thus 

X = m 4 - nX -t- n(m + oX) -|- n*x, etc.: 

and there results the infinite expansion 

(5) X = 2 1 “^ = m -f nX; x‘*’ = nx'*"'’', A: > 1. 

k-l 

In section 4 convergence of (5) will be established for sufficiently small | «<! |. 

3. The elements of x'*\ It is necessary to consider closely the individual 
elements of Writing 



334 


A. T. LONSETH 


we note from (5) that 

t, - Z ; 

k-l 

this is precisely the Taylor’s series for the error in x,: each fj*’ is a homogeneous 
polynomial of degree k in the a’s and y’e. Writing for the cofactor of Ofy 
in A, 

x'“ = m + nX = A-‘(c - oX) 

All 

A "■ X 


Ain Ann 

X ■■■ X, 

All Anl 
"A *" 'a 


A In A nn 

A ■■■ A 

whence (summing hereafter from 1 to n on Greek-letter subscripts) 

~ 7 (Z Th Am — Z/ ‘''(■lAm — • ■' — Zn Z «)inAm)* 

^ H n ft 

From (6), if A: > 1, 

1 1 1 ’ 

"■^Sa^nAni 


1 1 

^^loiniAnn — ZotnnAnn " 

/ 

SO that 

/t r ft 

The sums ZynA^iy, Sa^iAny have obvious interpretations as determinants. 

4 Bounds and convergence of the series. Assuming | a,y |, | | ^ 5 and 

taking absolute values in (6), 

\e\ ^|-{-|(l + Z!=rn|)(X|Am|). 






( 8 ) 



SYSTEMS OF LINEAR EQUATIONS 


335 


It will be observed that equality can be attained for a particular choice of a’s 
and y’s as ±5: the bound for first-order errors is best possible. But it is not in 
general possible by a single choice of a’s and y’s to obtain equality for all j. 
Similarly from (7) 

S|4|<£i«r"ixEM,/i), k>i. 

whence by induction 

( 9 ) I I g (j-Aj (1 I X, |)(E E I ir'CE 1 I) 

Summing on k, 

E1 s ^ <1 + 21«, |)(E M„ I) (g p*-‘), 

with 

" [in ? ? I I- 

If p < 1, we can let m ^ oo ; 

(10) l«,l ^ l-j^d + E UJ)(E |xi^,|)/(i - p). 

Observing that the y’s occur linearly in (6) and (7), we conclude that (5) con¬ 
verges if 

(11) l«.,| ^ 5 < |A!/(E E Ixl.d)- 

r M 

It follows that the determmant of the system (3) cannot vanish if (11) holds. 
This IS rather remarkable, m that dZZ | A„, | is merely the maximum first-order 
term in the error of that determinant ([1], p. 108); the effect of higher order 
terms (i e., of any but first-order minors) m producing a zero determinant can 
be wholly ignored. 

From the remark after (8), it appears that equality in (9) and (10) cannot 
generally be attained. 

If (10) is written | {,• j g B/(l — p), it is easily seen that the remainder after 
the ^th approximation does not exceed p*jB/(l — p) 

6 . Probability distributions. We now consider some consequences of the 
following assumptions; the a’s and y’s are identical, independent random vari¬ 
ables, bounded by a 5 satisfying (11), and distributed symmetrically about zero. 
(It would be reasonable to assume further that they possess a frequency func¬ 
tion, which is nowhere concave upward ) Writing €(x) for "expectation of the 
random variable x," we have 

®(a.j) = €(7.) = 0, S(aJ>) = €(7?) = < 5 ^ 



330 


A. T. tiONSETH 


On account of independence and symmetry, the expectation of any power- 
product of a's and y's containing an odd power must be zero. To first order, 
the mean of the solution-error is approximated by 

( 12 ) = 0 ; 

and the standard deviation S/ by 

(13) .sr> = I 1(1 + s 4)(Z 

The second approximation to ay is also easily obtained: 

(14) a,‘*' = S(f‘«) 

Both (13) and (14) were given by Ethcrington [1], though in a leas symmetric 
form. Higher approximations, as he remarks, involve complicated summations; 
but if they should ever be required, the machinery exists in (G) and (7) for their 
systematic computation. As to the errors in using (13) for the standard devia^ 
tion S, and (14) for the mean, we know only that 

0 / =» a}*' + o(6*), S] => (<Sj‘ )* -f o{5*), 

Ethcrington ([1], p. Ill) considers the important special case of “rounding 
off” decimal expressions. Each a and c is supposed correct in the gth decimal 
place, the (g + l)th figure being "forced,” i.e., increased by one when the 
(g -I- 2)th figure is dropped, if the (g -f 2)th is 6, 6,7,8, or 9, Assuming constant 
frequency lO"’ in the interval (—ilO~*, J10~^» we may use (13) and (14) with 
= 10 ''’'/ 12 . 

Errors of observation are often assumed to be normally distributed. There is 
nothing against such an assumption with regard to the y'a, but the a’s must not 
make (3) singular, and must accordingly be suitably bounded, e.g. by (11). 

6 . Conclusion. The formulas and bounds of this paper involve only these 
quantities: the determinant A, its first order minors, and the solutions of (1). 
They can be found in the course of solving (1) by orthodox methods. 

Inequality (10) definitely limits the maximum solution-errors, in terms of the 
maximum coefficient-error d, provided 6 satisfies (11). But it may be that (8), 
either alone or in conjunction with the second-order bound from (9), will give a 
better approximation. 

The ratio 22 | A^, |/( A | may be taken as a “measure of sensitivity” of (1) 
to error. 

The fundamental formulas (6) and (7) are capable of solving other problems 
than those studied here. For example, it may happen that only certain elements 
(such as those of a single column) are in error, in which case better inequalities 
can be found. Or the a’s and y’& may not be independently and identically 
distributed. 



SYSTEMS OF LINEAR EQUATIONS 


337 


REPEEENCES 

[1] I. M. H. Ethebington, “On errors in determinants,” Proc Edinburgh Math. Soc , 

Ser 2,Vol 3 (1932), pp. 107-117. 

[2] F. II Moulton, “On the solutions of linear equations having small determinants,” 

Amer. Math. Monthly, Vol 20 (1913), pp 242-249. 

[3] L. B Tuckbrman, “On the mathematically significant figures in the solution of simul¬ 

taneous linear equations,” j4nnai« o/Afatli jSiat., Vol 12 (1941), pp 307-316 

[4] P G. Hobl, “The errors involved in evaluating correlation determinants,” Annals oj 

Math, Slat., Vol. 11 (1940), pp. S8-65 



ON MUTUALLY FAVORABLE EVENTS 

By KaI'Lai Chung 
Tsing Hua University, Kunming, China 

Introduction. For a set of arbitrary events, E. J. Gumbel, M, Frdcliet and the 
author* have recently obtained incriualities between sums of certain proba¬ 
bility functions. One of the results of the author is the following: 

Let El, ,E„ be n arbitrary events and let , ■ ■ ■ , n) denote the 
probability of the occurrence of at least m events out of the k events 
E,, , • • ■ , E,^.. Then, for A = 1, ■ • , n — 1 and 1 g m ^ A: we have 



where the summations extend respectively to all combinations of A: + 1 and h 
indice.s out of the n indice,s 1 , •• • , ft. 

In course of proof of the above inequalities it appears that similar inequalities 
between products in.stead of sums can be obtained under certain as-sumptions 
regarding the nature of interdependence of the events. We shall first study the 
nature of such assumptions, and then proceed to the proof of the said inequalities 
(Theorems 1 and 2). It may be noted that the inductive method used here 
serves equally well for the proof of the inequalities cited above, though some¬ 
what longer, but apparently our former method is not applicable here. 

That events satisfying our assumptions actually exist, is shown by an appli¬ 
cation to the elementary theory of numbers. The author feels incompetent to 
discuss other possible fields of application. 

1 , Let a set of events be given 

El , Ei , •' • , En , ' ' ' 

and let E[ denote the event non-Sj, Let p(i) denote the probability of the 
occurrence of Ei, p(i'0 that of the occurrence of E[. For convenience we 
assume that for any i p,(l — p,) 0 ; events with the exceptional probabilities 

0 or 1 may evidently be left out of account. 

Let p{vi vk) denote the probability of the occurrence of the conjunction 
■■■ and let p(pi • ■ • pk t vi “ ■ n) denote the probability of the occur¬ 
rence of E,, , on the hypothesis that F,., ■ • > have occurred. The 

/I'a or i/'b may be accented. 

Definition 1: If p{n, rj) > p{vi)> sol/ (hat the'occurrence of the event F,, 
is favorable to the occurrence of the event F„ , or simply that F,, is favorable to F,,. 


‘ “On the probability of the occurrence of at least« events among ft arbitrary events," 
Annah of Math. Slat. Vol 12 (1941), pp, 328-338. 

338 



MUTUALLY FAVORABLE EVENTS 


339 


If p{vi , V 2 ) = pM, we say that is indifferent to E,, . If p(vi, vi) < p^vi), 
we say that E,^ is unfavorable to E,^ . 

Thus the relations "favorableness,” “indifference,” and “unfavorableness” are 
mutually exclusive and together exhaustive. We state the following immediate 
consequences: 

(i) Reflexity: An event is favorable to itself; in fact, p{v, v) = 1 > p{v). 

(ii) Symmetry: If Ei is favorable (indifferent, unfavorable) to B 2 , then Ei 
is favorable (indifferent, unfavorable) to Ex . In fact, we have 

p(l)p(l, 2 ) = 59 ( 12 ) = p( 2 )p( 2 , 1 ), 

p(l. 2 ) ^ p( 2 , 1 ) 

P(2) p(l) 

Thus p(l, 2) ^ p(2) is equivalent to p(2, 1) ^ p(l). 

In particular, if Ex is indifferent to Ei , then so is Ei i/a Ex. They are then 
usually said to be independent of each other. 

(iii) If El is favorable (indifferent, unfavorable) to Ei , then E[ is unfavorable 
(indifferent, favorable) to Ei . For, we have 

P(1)P(1. 2) + p(l')p(l', 2) = p(12) + p(T2) « p(2), 

whence 

pdOpd', 2 ) = p( 2 ) - p(l)p(l, 2 ). 

On the other hand, 

pd0p(2) = [1 - Pd)]p(2) = P(2) - p(l)p(2). 

Since by assumption p(l0p(2) 5 ^ 0, we have 

p(l^ 2 ) ^ p( 2 ) - p(l)p(l, 2 ) 

p( 2 ) p( 2 ) - p(l)p( 2 ) 

Thus 

p(l', 2) ^ p(2) according as p(l, 2) \ p(2). 

For the sake of brevity we introduce the following symbolic notation: 

( 1, if is favorable to Ei 
0, if El ie indifferent to Et 
— 1 , if Ex is unfavorable to Ei . 

Then by (ii) and (iii) we have 

Ex/Ei = EifEx , 

E'x/Ei = Ei/E'x = Ex/E'i = E'i/Ex = -{Ex/Ei), 

E'x/Ei = Ei/E'x = Ex/Ei , 

analogous to the rules of signs in the multiplication of integers. 



340 


KAI-LAI CHUNG 


(iv) Non-transitivity: If Ei is favorable to Et, and £'s is favorable to jg, 
it does not necessarily follow that Ei is favorable to E*; in fact, it may happen 
that Ex is unfavorable to Ex. For instance, imagine 11 identical balls in a bag 
marked respectively with the numbers 

~11, -10, -3, -2, -1, 2, 4, 6, 11, 13, 16. 

Let a ball be drawn at random, I^et 

El = (the event of the number on the ball being positive) 

Ei = (the event of the number on the ball being even) 

Ei = (the event of the number on the ball being of I digit) 

We have 

p(l, 2) = 4 > ■A' ® p(2). 
p(2, 3) = 4- > * p(3), 

p(l, 3) « i < A « p(3). 

(v) It may happen that Ei/E, = 1, Bt/Bi = 1, but EiEi/E, = —1. In the 
example above, 

p(2, 1) w 4 > /r = p(l), 

p(3', 1) « f > »= p(l), 
p(23', 1) « 4 < * » P(l). 

(vi) It may happen that Ei/Ex = 1, « 1, but Ex/BiEt = -1. jEx- 

ample: 

p(li 2) « 4 > »= p(2), 

p(l, 3') “ 4 > A “ p(3'), 
p(l, 23') - 4 < * » P(23'). 

(vii) It may happen that Ei/Ex = 1, E,/Bi ■= 1, but the disjunction 
(El -t- Ei)/Ei = — 1. For, by (v) we know that there exist events B[,E'i , Ei 
such that 

E[/E'i -= 1, E'i/E'i « 1, E[B't/B'i « -1. 

Hence by (iii) there exist events Ei , Et , Ei such that 

Ei/E, « 1 , Ei/E, « 1 , (E'Xy/B, - - 1 . 

But (EXY <= Ex + El. Thus the last relation is (Ex + Ei)/Ei « —1. 

(viii) It may happen that Ei/Ei « 1, « 1, but Ei/(Et ■+■ Ex) = -1. 

This follows from (vi) as (vii) follows from (v). 

After all these negative results in (iv)-(viii), we see that we cannot expect to 
go far without making stronger assumptions regarding the nature of inter- 



MUTUALLY FAVORABLE EVENTS 


341 


dependence between the events in the set. Firstly, in view of (iv), we shall 
restrict ourselves to consideration of a set of events in which each event is 
favorable to every other. Secondly, in view of (v), we shall only consider the 
case where the “favorableness,” as defined above, shall be cumulative in its 
effect, that is to say, the more events favorable to a given event have been 
known to occur, the more probable this given event shall be esteemed. We 
formulate these two conditions in mathematical terms, as follows: 

Definition 2; A set of events Ei, ■ ■ , , • • • is said to be strongly mutually 

favorable (in the first sense) if, for every integer h and every set of distinct indices 
(-positive integers) yi, ■ ■ ■ , uk o-nd v we have 

pilii ■ •' iih , v) > p(ni • • ■ , v). 

This definition requires that there exist no implication relation between any 
event and any conjunction of events in the set; in particular, that the events 
are all distinct. It would be more convenient to consider the relation “favor¬ 
able or indifferent to.” This will be done later on. The present definitions 
have the advantage of being logically clear cut and also that of yielding unam¬ 
biguous inequalities. 

From Definition 2 we deduce the following consequences; 

(1) If the set (ti* , ■ • • , y,*) is a sub-set of (jui, • • • , yh), we have 

P(mi •••/'*,»') > p(mi ■■■ th , v). 

(2) For any positive integer k and any two sets (vi, • • • , n) and (mi , • • • , ys) 
where all the indices are distinct, we have 

viyi • • • yh , vi • • ■ vk) > p(yi • • ■ yh-i, n - • ■ vk)- 

More generally, we have as in (1), 

p(yi ■ • ■ yn , n • • ■ n) > 'o(yl ■ • • y* , n ■ ■ • vk). 

Proof; We have only to prove the first inequality. For fc = 1 this is the 
assumption in Definition 2. Suppose that the inequality holds for A: — 1, we 
shall prove that it holds for k, too. 

?(mi ■■■ yi, VI ■■■ Vk) _ piyi • • • yh-Opjyi • • • yh)p(yi ■ • ■ yi, vi • • • n) 
plyi ■. • ma -1 ,vi---Vk) p(yi yh)p{yj - • ■ yh-i)p(yi ■ ■ ■ yh-i , vi ■ • • v*) 

= p(i*i • • • ys-i)p(y]. ■ •' yivi •' ■ n) 
p(yi ■ • • yh)p(yi ■ ■ • ys-ivi Vk) 

_ p(yi • ■ • yK-i)p(yi • • • yh)p(yi • ■ • yi, vi)p(yi ■ ■ • ykVi, vi ■■■ vk) 

p(yi • • • ys)p(yi - • • ys-dpiyi ■ ■ • ys-i, vi)p(yi • • • ys-ivi,Vi---vC) 

_ p(yi • ■ • yh, vi) pjyi • • • ysvi, V 2 ■ ■ ■ vk) 

p{yi ■ ■ • yh-i, vi) p(yi ■ ■ ■ mo-i i **2 • • ■ vi) 

^ p(yi ■ • ■ ykvi, Vi n) ^ ^ 
p(yi ■ ■ ■ yh-ivi, Vi ■ ■■ Vk) 



342 


KM-UAl rHttNO 


Observe that none of the denominatom vanish by our original assumption and 
by Definition 2. 

Therefore we see that wlieii the failure in (v) is remedicil by our definition, 
the failure in (vi) is automatically remedied too. 


2. Theorem 1; Ul7t> 1 and kt Ei ^ " ■ ■ , • be a set of strongly mutually 

favorable events (m the first sense). Then uv, luwe, for fc “ 1, • • • , n — 1, 


n [pf*-! 




II (pf*-! * • • n)) 


where the products extend respectively to all cotnbinaliorw of fe + 1 and k distinct 
indices out of the. indices 1, • • ■ , n. 

Proof. We may assume that the indices are written so that 

1 SS Vi < i«a < ■ < ■ < »• 

Taking logarithms, w'e have 

(t I J) 2 loK • ’ • ‘'*u) > (” 7 0 T. log pin . •. n). 

\K 1/ \ K / n. 

Substituting from the obvious formula 

pin .. - Vi) » p(>'i)p(t'i, vs)p(fi>>j , V,) ‘ • pivi • ‘ • vi_i, Vi), 

and writing log ?(••■) =* <?(■ • •). the inequality becomes 
\k ~ l} ‘'s) + ■' ■ + sf*'! ■ " >'*1 

( 1 ) 

> \ fc / »>'>)+'•' + ffC"! • • ■ ‘'A-1 1 >'*))• 

Immediately we observe that the number of terms of the form 
5 (vi • ■ ■ V,, g)(0 g 5 ^ g — I) with a fixed u after the comma in the bracket 
is the same on both sides, since 


(2) 




Let the sums of such j’s on the loft and right of (1) be 

and ff”' = o-'”(g) respectively. To prove our theorem it is sufficient to prove 

that <r‘‘’(M) S <r'*’(M) for every u and <r“’(g) > (r‘**Ci() for at least one g. 

Now the terms in <r'" (or v^*’) fall into classes according to the number 8 of the 
M,’a before the comma in the bracket. Let those terms having s g<’s before the 
comma belong to the a-th class. It is evident that the number of terms of 
the s-th class m <r“’(g) is equal to 


i 





MUTUALLY FAVORABLE EVENTS 


343 


for s = 0, 1, • • ■ , A — 1; where we make the convention that 

= 1, = 0 if o < b or if b < 0. 

Thus for a fixed n, when the terms in are classified in the above manner, 

its total number of terms may be written as the following sum, in which vanishing 
terms may occur; 

/n - l\/n - l\ ^ /n - l\Ufi - l\/ n - fi \ 

k j Vfc - iAVm - 1/V*-M +1/ 

+ ■■■+(" oOCi"))- 

Similarly the total number of terms in may be written as the 

following sum: 

(" 10 (I=0 - ^ 0 {(:: 0 G: J)G -:+0 

+ ■ •+(“10G-«-1)'^ "■ ■‘"G 0 OG-0}' 

Lemma 1: For 0 ^ s ^ k, we have^ taking account of our conventions about the 
binomial coefficients, 

0) G-OG-AG^OG-s-i) 
w) G=0G=AGi0G--0 

Proof- Suppose s ^ k ~ n fi, then 

/n - - A ^ A ~ A G ” “ ^ 

[k - l)\k - s) <\ k J\k-s-lJ 

according as 

k ^ _ k ~ s _ 

n — — + ® + 

i.e. according as 

5 I (m - l)k/n. 

But, since k < n and ^ n, we have 

n — k — k/n + 1 1> (n k)ti/n 

{n — l)k/n >k — n + fi — I 



344 


KAI-LAI CHUNG 


SO that 


(ft — l)k/n + 1^4 — n + A- 

Therefore if s > (m ~ l)fc/n, then s § (m - l)A;/n -h 1 S it - n + and (3) 
holds. ' 

Again, Uk — n + jx^s^in — l)k/n, then (4) holds; while if s < fc — ^ 
then the left-hand shie of (4) vanishc.s while the right-hand .siih; is non-negative' 
thins (4) holds for 8 ^ (ft — The lemma is proved. ' 

If we put (s » 0, 1, ■ • • , it) 

then by Lemma 1, 

d. ^ 0 according as s ^ (fi - l)k/n. 

This means that although the total number of terms of the form p(/ii • • • n) 
is the same on both sides of (1), the left-hand side is more abundant in terras 
with larger s while the right-hand side is more abundant in terms with smaller s. 
Now we have 


3 (mi ■ m , h) > q(jit ■ n* , n) 

ii i>j and if (mj • • • n*) ia a subset of (/h • • • a,). Hence it is natural to suppose 
that the left-hand side must be greater because it is more abundant in terms of 
larger values. Unfortunately even if i > j, the lost inequality ia in general not 
true if the set (/n • • • a?) is not a sub-set of (mi • ■ • m>) • Therefore we cannot 
as yet conclude that <r‘*’ ^ cr™. 

To prove that actually we have <r^'* £ we make the following "process of 
compensation"; 

We have, by (2) and the definition of d, , the following equality: 



o 1 

+ 

1 

1 1 

+ 

+ 

where d; = 

0 if i > k. Thus 



d. ^ 0 

for s g k(n — l)/n, 

Hence 

d, ^ 0 

for s > kin - l)/n. 

(5) 

C 0 + 



for 1 = 0, 1, • ■ •, M - 1- 



mutually favorable events 


345 


For the fixed ij., let 


Pi 


( 2 ) 

PI 


{k - i){( k 0®" + (fc _ + •■ • 

+ (ifc-7-i) ^ 

\K> I 1/ ^i<.. 

SO that 

pii\ = 

For /I = 1, Z = 0, we have 

^<^>(1) = p5‘) = pj*) = ^(«(i). 

Lemma 2; For n > 1 and 0 £ I < n — I, we have 


PI 


Ml 


I p) 


, p) 


H q(pi---pi,p) < ■ ^'W Z 


Qipi • • • pi+i, m)• 

i 

Proof; We have, for any v < p, v 9 ^ p ^ {i = 1 , ■■ , 1 ) 

q{pi • • ■ Piv, p) > q{pj. pi', p). 

Summing with respect to all such r’s, 

Z q{pi ■ • ■ PiVf p) > {p — I — l)3(^i ■■ ■ pi, v). 


Summing with respect to all 1 ^ pi< • ■ ■ <pi < p, 

Z Z • jui r, m) = (Z + 1 ) 




qipi 


pi+i 


,p) 


1S(‘1< <MI+1<;1 

> (/I — Z — 1) g(/ii pi, fi). 

The lemma is proved. 

Now we use induction to prove that for /i > 1 and Z = 1 , • - • , ;i — 1 


(1) «) ^ 
Pi — Pi > 


erO 


X Z q(pi Pi, p) so. 

This inequality holds for Z = 1 by Lemma 2 . Assume that it holds for 
I, {I < p — 1). Then we have, by (5) and the fact that each 5 < 0, 



346 


KAI'LAI f-Ht’NO 


Pi+1 — Pt4i ~ Pi Pt ■r Of+i Zrf ■ ■ ‘ Mf+1 ) p) 


do "f" 


p - 1 
1 


rf, + • ■. + 

CTO 


CtO 


dt 


E-?(w ■ ■■ pi, p) 

+ <’•^((1 S ^(pi * • • W+1| p) 


rffl + (^ ^ ^)rii + ■■■ + (^ 


dt 


^ 1 + ^'+1 | 53 §(mi ••• pi+i,fi) 


CtO 

*'“+C1T ■ '_■*■ T 0_i' C + 0 


■ P(+ 1 . m) ^ 0 , 


Therefore, for ^ > 1 , wc have, 

a [p) — a- Kp) = p,,_i — p„_i > 0. 
Since, n > 1 and I ^ p ^ n, there exists a m > 1 . Hence 

t > t 

which is equivalent to the inequality (1). 


3. Our next step will be to obtain a generalization of Theorem 1 . Consider 
a derived event defined by a disjunction of a (finite) number of events in the 
set, as follows. 

We call such a disjunction a disjunction of the m-th order. 

Definition 3: j 4 sef of events is said to be strongly mutually favorable in the 
second sense if for every positive integer m, the derived set of events consisting of 
all the disjunctions of the m-th order forms a strongly mutually favorable set of 
events {in the first sense), 

Let D = D{m) denote in general a disjunction of the m-th order; let 
p{Di ■ ■ • Dh ,D) denote the probability of the occurrence of the disjunction 
D, on the hypothesis that the conjunction of the h disjunctions Dt - > • Dh has 
occurred. Then Definition 3 says that for any positive integer h and any set 
of distinct D'b we have 

p(A ■■■ Dh,D) > piDf- Dh-i, D). 

Since a disjunction of the 1st order is an event E, we see that Definition 3 
includes Definition 2. 



MUTUALLY PAVOHABLE EVENl'S 


347 


Let Dmivi , ■ • , vk), vi < • • < n denote the derived event 

n (K. + • • ■ + Ej 

where the product (conjunction) extends to all combinations of m indices 
out of the indices ri, • ■ ■ , n Let , ■ • ■ , f'a) denote the probability of 
the occurrence of , Vk)- It is seen that pi(vi, ■ ■ • , n) = pin ■ ■ vk) 

in our previous notation. 

We merely state Theorem 2, whose proof is analogous to that of Theorem 1 
but requires more cumbersome expressions. 

Theorem 2: Let n "> k ^ m ^ 1 and let Ei, ■ • • , be a set of mutually 
strongly favorable events tn the second sense. Then we have 

To give an interpretation of p*(vi, • • • , j/*), we prove the symbolic equation 
between events. 


!>«. = n (£/,,+ •••+ J 

“ S ■■■ = Ck-m^i-l, 

where product means conjunction and sum means disjunction. 

To prove this, we write for a general event E, E = 1 when E occurs, E = 0 
when E does not occur. Now if Ck-m+i = 0, then at most k — m events among 
the k given events occur, so that there exist m events such that E\^ = 0 , Ex^ = 0 , 
Ex„ = 0 , thus 

Exi + Exj + • • • + Ex„ = 0 


Now the last disjunction is contained in Dm as a factor, therefore Dm = 0- 
Conversely, if Dm = 0, at least one of its factors = 0, so that there exist m 
events, such that Ex^ = 0, Ex, = 0, • ■ , Ex„ = 0. Thus at most k — m events 
out of the k given events occur and so by definition Cx-m+i = 0 Q.e.d. 

From the above it immediately follows that 

Pm(.Vl I • ' ' 7 rf) “ Pfc—J7n+l(ri 7 ' * * 7 Vk') 

where pi_ 7 n+i(ri , • • ■ , Vk) is defined in the Introduction. Then Theorem 2 
may be written as 


or again as 





> Il[W„_l(»'(7 



(“) 


-1 



KAI't^I C-HCNG 


where* , • • ■ , df'Uf)tf}< the proJuihility of the or.currerice of at most 

m. ~ 1 events mil of the k events K[^ , • , E',^. 

Rkmahk. If in our Definitions- 2 and 3 we replace the sign ">” by the sign 
"S”, then we obtain tlie ineijnnlifie.s in Theorems 1 and 2 with the sign 
replaced by Tlie eorrestMinding set of events thus newly defined will be 

said to lie strongly niutualiy favorable or indilTererit (in tbe fn-st or second sense). 

After this inodifieation, we ran include events wnth the pmhaliility 1 in our 
cciu.siderations. Al.^-o, the events need no longer he distinct and there may 
now exist implication relations betw-een I'vents or their conjunetions. This 
modification is u.seful for the following application. 

4. Cionsider the divisibility of a random i>ositive integer by the set of positive 
integers. To each positive integer there correspond.^ an event, namely the event 
that the random positive integer Is divisible by it. The enumerable set of events 

Al , Ki I E %, ^4 t ‘ ' , En I ‘ ’ 

where A'„ = the event of divisibility by n, with the prohabilitif« 

111 I 
’2'3'4’ ' n’ 

evidently forms a set of strongly mutually favorable or indifferent events in 
the second sense. 

Again, the enumerable set of events 
where B'„ = the event of non-divisibility by n, with the probabilities 


12 3 
2’3’4'’ " ’ 


n - I 


evidently also forms a set of strongly mutually favorable or indifferent events 
in the second sense. 

Hence our Theorem 2 can be applied to both sets and in this way we obtain 
results which belong properly to the elementary theory of numbers. 

We shall content ourselves with indicating a few examples. 

Let (tti, ■ • • , o„j denote the least common multiple of the natural numbers 
fli, ' •' , a„. Then Theorem 1, when applied to the two. sets above, gives 
respectively 

Theorem 1.1: Lei ai, ■ • • , be any positive integers, then w have, 
for k = 1, • ‘ ■ ,n ~ 1 


( - 1 -A(" 

<n+ia'' lei'll '' ‘ I ^'it+i]/ 


^ ( n 




MUTUALLY FAVOBABLE EVENTS 


349 


Theorem 1.2: Also we have, 


igi'i 


n (i- £ E _i_ 






{flu ) ■ ■ ■ ) Cl| 


1 \i-rT 


‘n+i 


s n (i- E i+ £ ^ 


- + '•• + (- 1 )* 
A trivial corollary of Theorem 1 is 

p(12 • • • n) ^ PiPj . • pn. 

Correspondingly we have 


{fli.]) 


’*'1 ) > «VJt j 




(tl) 


1 - 


2. i + E — - - 


If we multiply by aioj - • ■ o„, we get 


+ ••• + (- 1 )” 





A(ai, 02, • •', aO ^ (oi - 1)(02 - 1) ■ • • (On ~ 1), 

where Afoi, ' • •, a„) denotes the number of positive integers ^ 0102 •■■an 
that are not divisible by any of the a, (f = 1, • • • , n). 

This last result, which is almost obvious here, was proved by H. Rohrbach 
and H. Heilbronn independentlySee also my generalization’ (also obvious 
from the present point of view) of this result to higher dimensional sets of 
positive integers and to sets of ideals in any algebraic number field. 


* “Beweis emer zahlentheoretische Ungleichung," Jour, filr Math., Vol 177 (1937), 
pp. 193-196 "On an inequality in the elementary theory of numbers,” Proc. Camb. Phil. 
Soc, Vol, 33, (1937), pp. 207-209. 

* "A generalization of an inequality in the elementary theory of numbers,” Jour, fiir 
Math., Vol. 183 (1941), p. 103, 



OBSERVATIONS ON ANALYSIS OF VARIANCE THEORY 
By Hilda Gkiiunokr^ 

Bri/n Maur CoUrgr 

Ono i)f tlu' important probloniK of ihtHiretical slalistirH is tho following. Let 
xi, Xi, ’ • • Xu be till' results of ,V ob.servatums; by niean.s of these results we 
want to test the hypothesis that i.s the ciistribution of the ith chance 
variable x,. In that situation we often deeitie to elioose. a teat function 
F{xi , xa, ■ ■ Xu) and to deterniine the distribution of F under the, above assump¬ 
tion, By means of thi.s distribution we compute the probability of fi S F g Ja 
and compare, this re.sult with the ob.served value of F. 

Suppose there are m group.s, each of n olL'^ervatioua on m-n chance variables 

. We may test hypothese.s regarding the lun di.stributions of the x^, in the 
way just mentioned. In analysis of variance tlumry we often u.ne as teat func¬ 
tions certain quadratic forms si and «* ("variance within" and “among clasaea’’) 
and their quotient (midtiplied by fn(n — l)/(m — 1)), u.sually denoted by z. 
Its distribution has been inve.stigated by 11. A. Fisher {2] under the assumption 
that tho chance variables are mutually independent and .subject to the same 
normal law. "The five per rent and one tn>r eimt points of this distribution 
have been tabulated by R, A. Fisher and are used to test, whether these two 
estimates of the same magnitude are significantly different. One. gets thus a 
test of significance to test whether our sample is a random sample from a homoge¬ 
neous normal population or notf If the probability of a certain z-value is too 
small we shall reject the hypothesis that the sample is a random sample from a 
homogeneous normal population" [5]. 

The use of Fislier’s z-teat is also recommended if we. may reasonably assume 
that the theoretical distributioms are approximately normal. "Unless some 
rather startling lack of normality is known or suspected analysis of variance may 
he used with confidence." This last remark can be understood by considering 
that, as we will see in detail, some of the basic results of our theory are inde¬ 
pendent of the normality of the populations It is however this assumption of 
normality which makes possible the complete and elegant solution of the problem 
of distribution obtained by R, A. Fisher. 

If it is not possible to determine the exact distribution of a test function under 
sufficiently gemeral assumptions we may: 

a) make simple and particular assumptions concerning tho populations 

b) investigate an asymptotic solution of the problem, i.e, determine the distri¬ 
butions of the test functions for large samples,* or 

c) study the mathematical expectations and the variances of the test functions 

' Research under a grant in aid of tho American Philoaophioal Society. 

* My italics, 

’ of. statement (a) page 366, 


350 



ANALYSIS OF VARIANCE 


351 


for small samples under appropriately general assumptions regarding the popu¬ 
lations (this should be done independently of concepts of estimation, unbiased 
estimate etc.). 

This last procedure provides us with tests which suffice in actual practice.'* 

It is well known that the expectations of the two forms sj/(m - 1) and 
sl,/m{n — 1) are the same even if the populations are not normal, but equal each 
other [Bernoulli series). In addition we shall prove the theorem, familiar in 
case of the Lexis quotient [9], that under these conditions the expectation of their 
quotients equals unity (section 1, (b)). The next step consists in investigating 
certain inequalities characteristic of Lexis or Poisson series. The different 
criteria will be completed by the computation of the respective variances (Sec¬ 
tion 1, (c)). 

In addition to the above mentioned test functions other symmetrical test 
functions have been considered [5] In studying these we shall again assume 
general populations. It will be seen that the Lexis as well as the Poisson series 
may be characterized by equalities (instead of inequalities) (Section 2, (a)), and 
we can generalize our theorem on the expectation of the quotient (Section 2, (b)) 
to this case. Then the variances of these test functions will be investigated. 

It seems worthwhile to omit the assumption of independence of the chance 
variables and to study different kinds of mutual dependence. These investiga¬ 
tions lead to interesting relations among the expectations^ (Section 2, (c)). 
They seem to be related to Fisher’s “intraclass correlation” and to supplement 
his idea. 

Most of the results of Sections 1 and 2 can be generalized to the analysis of 
covariance (section 3). 


1. Variance wi thin and among classes. 

(o). The test functions. Let (/i = 1, • • ■ m] v = 1, • ■ • n) be m-n chance 
variables and put 


( 1 ) 


1 

CLfi — — I Ofp “ ““ ^ 


n v-i 


m ^-i 


i ^ n -| 771 -1 71 

a = — S £ ■ 


^ The important paper of Irwin [6] assumes normality of the populations. H L. Rietz 
[8] computes the expectations of sj and si under rather general assumptions for the popula¬ 
tions and considers the cases of Bernoulli, Lexis, Poisson series, but does not consider tests 
of significance; nor does he consider the symmetric test functions (section 2 of this paper) 
In later papers on our subject the assumption of normal and independent populations 
recurs. Another approach [11] in the problem of analysis of variance is to use ranks instead 
of the actual values (this has been pointed out by the referee to the author, who is very 
grateful for this comment). 

‘ They generalize previous results of the author. 



352 


HILDA OElItIN'OER 


We then introduce the three quadratic forms 

f2) = £ 22 O^r ~ a)®; s’ = n 22 ~ a)’; = 22 22 ~ a^)\ 

it V f/t H V 

with tiie respective ranks (defcreea of frewlom) 

(3) r * mn “1, r« « ot — 1, r«, m(n — 1). 

Tlien we have, 

(4) s’ “ s« + «» , r « r. + r„ . 

The m-n theoretical cUfltributions arc assumed in thia section to be inde¬ 
pendent of eacli other. I>et lx; the probability that S ® and 

(5) a^,~ j xdV^ix), erjr = j (x — a„,)^ dV^,(x), 


where the integrals are Sliclljes integrals; thus the F^,(x) may be p,.g. general 
arithmetical or geometrical di.stributions.’ 

I.ct U8 compute tlie mathematical expectation of the three teat functions with 
respect to the m-n-diniensional distribution; 

Fu(Xu)Flj(Xlj) • • • FwB(x»in). 

(6) A1iP'(xii, ‘ ■ • x„„)] I ... j F(xu, * ■ • Xm«) c/Fii(xu) • •' dF„„(x«,„). 


We have then 


(7) 

( 8 ) 
(9) 



-1 - «)’, 

nn mn — l 

^ - «)* . 

mn tti — 1 

i- 

mn m{n - 1) 


From these equalities we deduce; 

1. If the m-n theoretical mean values a^, are all equal (Bernoulli series), then the 
expectations in (6), (7), (8) are equal] t.e. 



2. If the a^, are equal "by rows” but differ from row to row (Ixjids series), i.e. 
Op, = a,, but a. Then 


“ is a monotone non-decreasing function. Hence it has at most a denumerable set 
of ordinary jump discontinuities; at such a point it is continuous to the right but not to 
the left. Moreover it possesses a finite derivative v,,,(x) almost everywhere. 



ANALYSIS OP VARIANCE 


353 


(U) - -^1 - r S («. - > 0, 

( 12 ) r - -. —- ,< 1 = —^ Z (“. - «)' > 0 . 

\_m. — \ min — 1)J ot — 1 V 

3. If the are equal "by columns" but differ from column to column (Poisson 
series), then a^, = 5^ ; a,, = a and 

(13) E, r - -A_ 1 = - Z - a)^ < 0, 

\_m — 1 mn — IJ mn — I , 

^ , Z(a- - ccf < 0. 

[m — 1 min — 1)J n — ly 


(14) 


Er 


In the Lexis theory^ we speak of normal, supernormal or subnormal dispersion 


depending on whether the observed value of 


m — 1 

2 

that of-- and we usually consider the quotient 

mn — 1 • 

(15) 

m — 1 / mn — 1 


is equal, greater or less than 


In analysis of variance theory we usually compare sl/im — 1) (variance among 
rows) with sh/min — 1 ) (variance within rows) and introduce the quotient 

(16) F = . 

m — 1 / m(n — 1 ) 

It follows from (4). 7/ L ^ 1 then F S 1 and conversely. We may therefore 
speak of normal or non-normal dispersion with respect either to L or to V. 

The results given by equations (10)-(14) can be expressed as follows: If the 
m-n theoretical distributions are all equal the mathematical expectation of s^/r, of 
sjTa and of st,/r,„ are equal. In the case of a Lexis series the expectation of sl/r^ 
IS greater than ^/r and greater than 5 i,/r„ and in the case of a Poisson series the 
opposite is true. 

We generally use these facts in order to make inferences about the unknown 
populations from the observed values of our test functions Viiyix). If e.g., the ob¬ 
served value of Sa/ra is “significantly”* greater than that of s^/r we may assume 
that the theoretical distributions form a Lexis series. But of course such a 
significant deviation can also be explained by quite different assumptions re¬ 
garding the populations (see Section 2, (c)). 

(b). Mathematical expectation of the quotient of the test functions. We are going 
to prove in this section a theorem of some mathematical interest. This theorem 
is a generalisation of an analogous theorem in the Lexis theory [9]. 


’’ The relation between these considerations and the Lexis theory will be dealt with in 
another paper. 

• The meaning of the word “significantly” has of course still to be explained. 



354 


HILDA. QEIRIHGEH 


We have seen (10) that the mathematical expectations, defined by (6), of the 
three teat functions 

2 2 2 
Qt _^ ^ Of/ ^ 

rniT^l ’ m ^ ’ m(r^ ' 

are equal if ihe m^n populations are equal (i.e. have identical distributions). We 
will show that even in this case 

(17) 

Let us put m-n = N, and let the N chance variables be arranged in a one-dimen¬ 
sional sequence. As S' and S are of second degree in the x, iv ~ 1, 2, ■ • • N) 
we may write 

,S' - S = A + + S 0,x\ + S D„X,X, 

rr^p 

where the A, B,, C, and D*., are constants. Now form the expectation, defined 
by (6), of {S' — S) under the assumption that the iV populations are equal 
V,(t) = F(j) (r = 1 • • • A^) Denoting by a and a-'' the mean value and vari¬ 
ance of F(a;) and putting 25, = B, 2(7, ~ C, 277,, ■= D we find 

E{S' - S) = A + Bcx + + a*) -h Da^ = 0. 

And as this equality holds for an arbitrary distribution V{x), we deduce that 
A—B~C = D~0 Let us then compute under the same assumption the 
expectation of {S' — S)/S. Now the expectations of 1/5, x,/S, xl/S, x^Xf/S, 
take the place of the expectations of 1, a;,, si , XyX„. But these new expecta¬ 
tions arc also independent of the index v, because of the equality of the N popula¬ 

tions and the symmetry of S in the N' variables ii, ■ • • Xj/. Hence we may write 

and we find 

B ^ B = Ay, -h But -1- Cyz 4- Dy, = 0, 

because A=5 = (7 = I> = 0. Hence E{S'/S) == 1. 

We may prove in the same way that E{S"/S) =* 1. 

We have however proved (17) only under the assumption that all the N 
populations are equal, whereas (10) is true under the mere hypothesis that the 
mean values of the populations Vy{x) are the same. 

(c). The variances of the test functions. The distribution of our test functions 
and of their quotients V or L have been determined and tabulated by R. A. 
Fisher under the hypothesis that the m-n chance variables are independent and 
obey the same normal Gaussian law. Consequently by means of Fisher's distri- 



ANALYSIS OF VARUNCE 


355 


bution we can test only the hypothesis that the theoretical populations have 
both these properties 

If m a statistical problem it is not possible to determine the exact distributions 
of the test functions under sufficiently general assumptions regarding the popula¬ 
tions, one of the following procedures is frequently used: 

a) one tries to find an asymptotic solution of the problem, i.e to determine the 
distribution of the test functions in question for large samples. The distribution 
of the analysis of variance quotient, as n tends to infinity, has been established 
by W. G. Madow [6], The same problem for the Lexis quotient was solved as 
early as 1873 by Helmert [4]. As m tends to infinity the limiting distribution 
is a Gaussian distribution, which follows from general theorems of v Mises [7]. 

b) For small samples, i.e. if m and n are finite we may determine the expecta¬ 
tions and the variances of the test functions for appropriately general popula¬ 
tions and establish in this way a test of significance 

In this section we shall compute the variances of our test functions. Let us 
first assume arbitrary but equal populations yv(a;) = F(i:) and denote by M, 
the ith moment about the mean of F(ai). 


(18) 


f, = J (x ~ ay dV(x), 
a = J xdV(x), Ml = 


Then we find immediately the variance ot S = 
formula for the variance of a sample variance 


mn — 1 


(f = 1,2, 


using a well-known 


(19) Var 


mn — 1 


= Var 


\ mn — 1 


mn 


mn — Z 2 

Ml — -- M2 

mn — 1 


If we need the analogous variance in case of different populations we let 
= 'll iVp — bf where h = ~ {yi+ ■■■ + yr) 


p-1 


and let Vp(y), (p = 1, • • • r), be r populations, and 

0 = f y dVp(y), 

J r p-i 

' j {y- iSp)* dVpiy) = jui'", {i = 1,2, 
Then the following formula may be used: 


,p = 1,2, r),p^'’ = vp. 


Var (f“) 


(20) 






+ 4 Lni t + ^ vj . 

r p—i. p-i ' ?<’■ 



35G 


HILDA QEIIUNOEB 


We may check (20) by putting the Vp[y) all equal to Viy) and find 

(200 Var (lO - [(r - Dni - (r - 3)/], 

r 

in accordance with (19). 

In order to determine the variance of al by means of these formulae wo con- 

aider Z-l (Oh — «)* as a sample variance. The n distributions in the ?ith row 

are F;,i(a:), F^s(a:), • • • Fp„(i:). Or, if we assume that they are all equal, simply 

1 

V(x) = F(v)- Lat us put - Xp, == Zp, and V(xp,) =* V'(zp,), and denote by 

71 

TF(a„) the distribution of the average of the elements in the fith row: 

W(a,) = [ • ■ • I dV'izpO dV'izp,) ■ • • dF'(z„,„-i)F'(a„ - z,i-- z„„_i). 

There is such a distribution for each row, and we have to find the variance of 
(a„ — a)’ with respect to the combination of these m distributions. In order 
to be able to apply ( 20 ') we need the second and fourth momenta of these 
distributions. We have for the mean value «' of lF(a,.): 

«' =» n‘(mean value of F') = n-- «,,==« 

u 

9 

and for the variance of IF(o„): ^, We still need ni, By repeated use 

n 

of the formula 

J j [(a:i - Oi) 4- (*i - Oa)]^ dF(ji) dV(Xt) 

aiYaVixi) +J(xi- (hYcinXi) 

+ 6 J (xi — aiY dV{xi) j (xi — a2YdV{xt), 

and of the fact that TF(a„) is simply the distribution of the sum of n variables 
Zp, we get: 

+ 6 ~ (M 4 + 3(n - 1)M^) 

where M 4 and Mj are the values introduced in (18). 

We now apply (20') and get 

Var [ 2 (a, - af] = [(m - 1)^1 - (m - zU]. 

771 

and substituting the values of iii and yi , we find by an easy computation the 
final result: 




ANALYSIS OF VARIANCE 


357 


( 21 ) 



S(a;, — a)' 


= ^ (M, - 3Ml) + Ml 
mn m — 1 


If we compare this last formula with (19) we see that the right side in (21) 
is of order l/m, whereas that in (19) is of order 1/mn. Therefore, for sufficiently 
large values of n, i/r will be “more exact" than s^ra . In some presentations 
of the Lexis theory it is implied that the value ijr^ is to be compared with the 
theoretical or exact value s^/r; we may see a certain justification for this idea in 
the result just mentioned. This may lead us also to use s jr as an unbiased 
estimate of the unknown population variance if = a (see ( 7 ) and ( 8 )). 

By means of the simple formulae (19) and (21) we can now easily test whether 
the values of s /r and Sa/ra whose expectations are equal in case of equal popu- 
lations differ significantly from each other. Of course we must compute as usual 
approximate values of Mj and from the ohservations If n is comparatively 


large—as it usually is e.g. in the Lexis theory—only the term-- Mj will be 

OT — 1 

significant. If the hypothetical population is Gaussian (M 4 = 3 M?) the right 

side of (21) reduces to-- Ms and that of (19) to-; hence these vari- 

m — 1 mn — 1 

ances are in the ratio of i , as one might expect. 

Ta/ r 


2. Symmetric Test Fimctions. 

(a). New equalities for Lexis and Poisson series. In Section 1 , starting with the 
formula s* = s' + we used the test functions s’^/r, s\/ra , s\,/r^ . This implied 
a difference between rows and columns, which is often justified, e.g. in the Lexis 
theory. The following decomposition of s“ is symmetric with respect to rows 
and columns. Let 


1 " 1 ^ ^ 


n ,.-.1 


m u-i 


(1) 

' 1 n 1 m 1 n 

— 2 S 2 O" = 

mn ,-i m ^-1 n »-i 
and 

s — — 0) , Sq = cz) , Su, " SS(2J^,, 0^)“ 

= 2S(3!iii, ~~ 0/1 — fl) , Sa ~ niZ{df — fl), Stu ~ SS(a;(i, df) 

with the respective ranks 

r = mn — 1, = m. — 1, r^, = m(n — 1), 


(3) 


B = (m — l)(n — 1), Ta — n — 1, r^ = n(m — 1). 


Then 

( 5 ) 


s" = S* + + -S' = Sa + si = g” -h §l 



358 


HILDA GBiaiNGER 


and 


(0) r = To + fa + K = To + r^, = fa + f„ . 

We find the. expectation.^ of these forma under the aasumptions, of arbitrary 
populations whicli are independent and different from each other. We 

then specialize for Bernouilli series, Lexis and Poi.ason series of populations 
respectively. Denoting by and al, the mean value and variance of 
and by 


(6) of^ I X) > a = - a„ = - X) a,, 

71 ^ 7?t ft TTX 71 


we find for the expected values defined in (6) Section 1: 
B' 


mn — 1 

,5 


^ + ----- IJSCa,, - a)\ 

inn ~ 1 


win 


(7) 


E F-J*!-.-] = -L ^ \ - af, 

Lm — IJ mn m — 1 

E \] =-L ~ m£(a, - a)^ 

Ln — Ij win n - 1 


E r- - f- - -1 = L 2S<rJ, + --F „ - + «)’, 

L(m —l)(n-l)J win (wi—l)(n —1) 


E 


m{n ~ 1)J mn 

Ou 

_n(ni ~ 


= -L. 2S<r;, + - ^ 


1 ). 


m(n — 1) 
1 


SS(a,, ~ a,)\ 


= ™ IXal, + -7---',, S2(«;., - a,)\ 

mn n\m — 1) 


In the Bernouilli case which as far as the author knows is the only one which 
has been considered in this connection [5], we get the wellknown result: 


Eb 


( 8 ) 


mn 


_ r, r «« 1 _ w. r «« 1 _ w r 1 

Lwi(n - 1)J Ln(m -'llJ l{m - l)(n - 1)_ ’ 


Now let us assume a Lexis series, with 


(9) a^, ~ a„ ; 

Then (7) reduces to 


, _ 13 

a; Of, = a, 




ANALYSIS OF VARIANCE 


359 







= - 20-2 + 
m 



S(q:^ 


a) . 


From these formulae we deduce—besides the inequalities (11), (12) of Section 1, 
and the corresponding formulae where the role of rows and columns is inter¬ 
changed—the further inequalities. 


( 11 ) 




“ “I r -2 “I r -2 - 

\_m — IJ \_n{m — 1)J — 1_ 


But there are also characteristic equalities, namely 

[^] ■ 

These equalities^ seem often to be more appropriate than the usual inequalities 
in testing the hypothesis of a Lexis series 

Let us finally consider the Poisson case which is very often neglected. There 
we have. 


(13) 


dfiff Otf , oil. ^ OLj 


2 


= 9, 


Then—beside the inequalities (13), (14) of Section 1 and the corresponding 
ones where the role of rows and columns is interchanged—we find the new 
inequality: 


(14) 




{a, — af < 0 , 


which of course corresponds to the Lexis inequality (11). The characteristic 
equalities are now: 


(15) 


E, 


r 2 “I 
_m — 1_ 


Ep 




L(wi - l)(n 


_ 7(1 _ _ 

~ 1). ^ [n{m — 1). 


These equalities (12) and (15) can be used in testing the hypothesis of Lexis or 
Poisson series respectively in the same way as the equalities (9) for the Ber- 
nouilli case. We shall deal with the variances of these test functions in (d) of 
this section. 

(b). Mathematical expectations of the quotients of certain test functions. We 

have seen that in case of a Lexis-Series the expectations of - - — r , of 

n — 1 

cl „2 

---and of — 7 —-—r are equal. We will show that even in this case 

(m - l)(n — 1) m{n — 1) 


“ See [10] pp. 81-90 for proofs of these inequalities for the case of normal populations. 



300 


HILDA QBIRINGEII 


(16) 


[ 


p / Stff 

_n — 1 / 771(71 ~ 


1)J 


in{n — 1), 

3 

E, ' 


/ _ 

/ (m — 1) 

p r . 

■"'L(771 — l)(n, — 1)/ min — 1)J 


Dl^i" 1)J 

xS' 

(irt ~ 1 ){r - 1)_ 


- 1 , 
= 1 , 
1 

T 


« 1 . 


iS” 

Let us write for the moment; ‘ ° , = 7 and , - -,, 

n — 1 (771 — l)(n ~ 1) 

T and T are of second degree in the x,,, we may wite: 


= T. 


As both 


T - T = A + E Ii,rx„ + E 4- 

><.K >!♦»* 


where the A, B, C, D arc constants. The lost .sum contains i-7nn(mn — 1) 
terms and not both in =» /jj and i = j hold. Compute the expectation of ? — T 
with respect to populations which form a Lexis serias ~ Denote 

by , (tJ tlie respective mean value.s and variances. We then have because 
of (11); 

0 « E,[f - n = A + E a, E /V 

M f 

+ E (4 + “*) T^^)U'+ E «7*i “»n E 

M t*2 

or introducing E ; E ; E 

f V (•} 

0 - jsaT - n - A + E + E (<^2 + + E • 

t* M 


As this equality is exact for an arbitrary set of V^x) we deduce that A = 0, 

Bf, = 0, = 0 > = 0 . 

L-etus now compute under the same assumption the expectation of (T — T)/T. 
Here the expectations of 1/T, x^,/T etc. will take the place of the expectations 
of 1, Xnw, • • • . But these new expectations will not depend on the index v 
(index within the row) because the populations are the same within each row 
and because of the symmetry of T in the m; n variables x^r. Hence we can put 



and we get 



E (*;;) = E = E « i..,,. etc. 

If - i) = Ai. + E + Eic, + E = 0, 

\^ / M li 


because all the coefficients are equal to zero. Our theorem is thus proved. The 
same conclusion holds if the denominator—without being symmetric in all tbe 



ANALTSIS OF VARIANCE 


361 


m • n variables—does not depend on the row index. And as this last property 
holds for Su, the expectations (16) are all shown to be equal to one. 

Analogous relations are valid for Poisson series. 

(c). Non-independent populations. We omit in this section the assumption of 
independence of the m-n populations but assume the theoretical population to 
be a general m-n-variate distribution: 


(170 V{xn,x^ , “ • x„n). 

From V (xu , Xu , ■ • • x„„) we derive the mn one-dimensional distributions ypv(x) 
(/I = 1, • • ■ m; V = 1, • • ■ n) by letting all the variables except tend to , 
because V^,ix) is the probability that S x regardless of the values of the 
other variables. In a similar way we derive the ^mn(nin — 1) two dimensional 
distributions y), that is the probability that ^ x and S y. 

We get this distribution from (170 all the variables with the exception of 
and x^,,, tend to + “. We denote as before by and the expectation of x^, 
and (Xfiv — 0(^0^ respectively. But the expectation of (x;,!^ — 
which was zero in case of the independence of x^j,i and x^j,j may now differ 
from zero Denote by & the expectation with respect to (170. Then: 


(17) 


~ J * * * y (^^l*'l ^Ml»'i)(®#*a»'a “ dF(xii, * * * Xmn) 

~ J J (^ ®>‘l>'l)(l/ OI^2>'l) .>’2(»a(^2/) ~ = Rn 


Let us first deduce a general formula for the expectation of a sample variance 
in the case of dependent populations. Let P{yi, • • • yr) be the distribution of r 
chance variables j/i, ■ • ■ y, which have the average b. Denoting by /3p the ex¬ 
pectation of Pp with respect to P, by /3 the average of the fip, by tI the expecta¬ 
tion of {yp — Ppf by P,j that of {yi — 0 ,)(j /2 — 00 we find, without difficulty, 
for the expectation of the sample variance 

Exp. 1^^ g {yp ^ 6 )“J 


(18) 


^ r / ' " / + ■ ■ ■ + (Vr ^)“] > ■■■ Vr) 



irl + 

»-i 


^ E (0, - 0)' 

r p 


2 

r“ 


E p.,. 

•<i 


Let us apply this result in the computation of the expectations of our test func¬ 
tions. It is not difficult to compute them m the general case of different mean 
values and variances But we restrict ourselves to the consideration of certain 
particular cases. Take first the case where all the m ■ n mean values are equal 



362 


HILDA GBIRINOER 


to each other and likewise the vi'Ji variances and the \mn{m.n — 1) covariances. 
Denote the.He magnitudes by a, o' and R, re.spectively, we .see from (18) that: 


(19) 




\m(n - 1)/ \n{m - 1)/ \(?ri - l)(7i ~ i)) 


t 

as ff 


~ R. 


We have thus obtained the result lhal in the cmr of dependent papulations, juai 
described, the expectations of the six different teat functions arc still the same. 

Of course we may assume many other particular kinds of mutual dependence 
of the populations. The following aasumption seem.s to be appropriate for 
problem.s where rows and columns play a different role: We comsider dependence 
only mihin each row, that means we assume only the variables , x ^2 , • • • x„„ 
as mutually dependent. The distribution (16) has then the following form: 

(20) T(3:u , '' * * *' :tin)V2(3:2i, * * * :r3r,) • ' * V*m(Xint, * * * s^mn). 

In the usual way we derive the m-n one dlmenmonal distributions and 

the ^mn{inn — 1) two-dimensional distributions F„,,,,„,,,(x, y). If g, /j, 
such a two-dimensional distribution reduces to the product of the respective one- 
dimensional distributions. Only the §mn(n — 1) liivariate distributions derived 
from one and the same T„(*„i, ■ ■ ■ Xi,f) will not reduce in this way. 

Denoting again by $ the expectation with respect to V(xii , ■ < - *„„) we find: 

“ "sji)] * 0 m 

( 21 ) , , . 

== ”1; Ml = Ms fl-tld I ^ j. 

Applying now formula (18) in the computation of the expectations of s*, si and 
al we find: 

SE 2 (v - «)■] - 2 2 e.. 


+ 2 2 ~ 2 2 fiir. 

mn p-i I c,' 

(22, «I2 2 b.. - -J’l = 2 2 

+ 22(«--«,)’-?22fli;’, 

n ,i».i <<)• 

S12 2 (<■- - «)■) - 2 2 

+ nE(a, - .)' + ~ E Eb!:’. 

mn ,,-.1 ,-</ 



ANALYSIS OP VAHIANCB 


363 


Let us now suppose that all the m-n distributions are equal to each other, or, at 
least, that' 

(23) otftp — 

This assumption, which is characterized by (21), is, of course, different from 
the one which leads us to (19). We find now by means of (22), if we set 


(24) 


E E = R 


It—l t<7 


and 


2 

mn{mn — 1) 


R = R, 




2 _ mn{n - D 

mn — 1 mn — 1 


Assuming R > 0 (positive average correlation) we may compare this result 
with (11) Section 1'. The term on the right side of (24) is also of the same order 
of magnitude as that in (11) —For negative R the teim on the right side of (24) 
IS negative and the equation may be compared with (13) Section 1. We see 
that for the test functions sVr and sl/r^ “‘positive, {negative) average correlation 
within rows” has the same effect as “Lexis (Poisson) Senes” of populations. 
Consider now the teat functions «« a^ud S^. We find 

(25) S[sa'] = &[Z2{Sp - a)'] = + mS(a, - a? - ^R, 

' ffVtl TflTl 


and 


(250 


6[iS0 == €[2S(a:,„ — a,, — a, + a) ] = 




mn 


+ XX{a,ip — a,, — a, + ct)^ —• 


2{m - 1) 


R. 


mn 


Assuming (23) we get: 

and if i? > 0: 

The first equality is analogous to (11) and (14) of Section 2 for positive or nega¬ 
tive R respectively.*'’ We also get under the assumption (23) 

(27) - «[ (.„ - IX. - f >] - ®[s(^n)]- 

>• I have studied m another paper the combination of Lexis series and “positive correla¬ 
tion within rows.” It turns out that the two kinds of positive effects reinforce each other. 
The same is true for "negative correlation” and Poisson senes See [31. 



364 


HILDA OEIRINGEH 


These are the same equations as (12) Section 2, and they are true for either sign 
of R. Hence they provide no way to decide between Lexis series and correlated 
populations. But computing the expectations of the. magnitudes which occur 
in (15) Section 2 we find from (22), (25) and (25') 

j, 


$ 


(28) 


a! = 
jn — 1_ 


+ (n - l)/2, 


& 


_n(w — 1), 


sw cr 


®L(w - l)(n - 1). 


SB tr* ~ R. 


And hence we may say: 

If the observed value of sl/(m — 1) is greater than that of ^,tfn{fn — 1) this can be 
explained either by the assumption of a Lexis series or a positive correlalion vnthin 
TOWS] but their equatity indicate, a Poisson aeries; ond if the first is smaller than 
the second we may assume negative correlation. 

In the same way we may explain 


' si 1 ^ r 

_n(m — l)JqbMr»«i L(rw — l)(n — l)Jo 


1)(tI ” l)job»nriHl 

either by positive correlation or by Lexis series; whereas the equality indicates 
a Poisson series and the sign < indicates negative correlation. 

(d). The variances of the test functions. We have still to find the variance of 
our teat functions. Let us compute the variance of 


SS(x„ 


Sp + a)* 


with respect to the m-n dimensional distribution F(xii)V'(xu) 
Let us put 

(29) x„, ~ a„ - a, + a ^ y ,,,, 

then we see that the average of the y,,, equals zero 

1 


V(x„„). 


§ = — — o-nSo. 

mn mn 


mn 


mHa, + a = 0. 


and 


= SS(x^, + o)’ - 7i2{y„, - p)\ 


Each y„, is a linear function of the x,., e.g. 
(m — l)(n — 1) 


Vm = Xu 


mn 


n— l-^ . I'o 'O 


mn ~i 


mn 


mn 1 


Xii 


— Xu Aa + Xj 2 ®iy + "h ^ S X x*/. 

1 > 3 1 


(30) 



ANALYSIS OF VARIANCE 


365 


Using the same notations as in Section 1 (c) we find, because of the independence 
of each chance variable 


(310 


Var (yii) = X? / + XsCn — l)<r“ + \l{in — 1)/ 


+ >>5(m - l)(n - l)ir“ = 


(m — l)(n - 1) 1 

—-— (T 

mn 


and we find the same result for each : 

(31) 

mn 

in agreement with the fourth line of (7) of this section. We still need Ml the 
fourth moment about the mean of which we can compute from the fourth 
moment of a sum. We find 


(32) Ml = AM^ + eB<r\ 

and we have 

A = Xi + (n — 1 )X 2 -f- {m — 1 )X 3 + {m — l)(n — 1)^4 

- 1)( W 7 J ) (^2 _ 3^ ^ 3)(^2 - 3n + 3), 


and 

B = X^J^^Cn - 1) + X3(m - 1) + A4(m - l)(?i - 1)) 

+ XiCn - l)(|X?(n - 2) + Xl{m - 1) + Xl(»i - l){n - 1)1 
^ ^ + X3(m — l){^X3(2n — 2) + XlCw — l)(w — 1)) 

+ ^x 1 (ot - l)(n - l)[(m - l)(n - 1 ) - !]• 

If we introduce the values of Xi, X 2 , X 3 , X 4 we find 

mVB = (m - l)\n - 1 )’(ot + n) + (m - l)*( 7 i - l)’‘(ffi + n - 2 ) 

( 34 ) + J(m — l)(n — 1 )[(ot — l)^(n — 2) + (n — l)’(?n — 2) 

+ (mn — m — n)] 

this expression as well as that of A may be easily computed for different values 
of m and n. 

If m and n are large, B is of order-1— f from (31)-(34) we see that in this 

'Wt' TJ» 

case <r'^ is approximately equal to and M 4 to M 4 . 

Using now (18') we find fiifally 


Var {2S (Xii, — 0 ^ — d, o)*} 


mn — 1 


{(mn — 1)m1 - (mn — 3)<r'^} 


mn 



HILDA OEIRINGEri 


36() 


where M, and a'^ are the expressions just cornputed. If we compare the variances 
of the tost functions sl/{m — 1) and S^/(m — l)(?i — 1) we see that whereas 
the variance of the first expression is of order l/in tliat of the second is of order 
1/mn, Hence, for large value.s of Ji the laUer expression is more exact than the 
former (see the analogous remark iSection 1 (c)). A similar statement can be 
made if sl/{n ~ 1) takes the place of sl/{m — 1). 


3. Bivariate distributions. Analysis of covariance. 


(a). Problem. Suppose m iicrsons arc throwing two dice, n times; we observe 
the rc,apectiv<; numliers on each die in these m-n trials. Or we observe on m 
groups of n per,sons the color of the hair and of the eyes. Or else we state for 
n years the yield of wheat (in bushels) per acre and the production cost (per 
bushel) for m farms; etc. 

We con.sider m-n pairs of numbers , y^,. Let y)'^ be the 

probability that ^ x and ^ y, V^,(x, + to ) = Vl,l\x), VU+ 2/) - 
y^V(j/) and introduce the following mean values and variancc,8 

(1) J J xdV^.{x,y) = a^,, j j ydV^,{x,y) = (9).., 

(2) // (^ - a^,YdVi„{x,y) = trj,, J j (y - PnrfdV^yixy) « tJ,, 

( 3 ) // (. — «)ir)(v — / 3 ,.r) dV^,{x, y) 


1 v* ~®u, Ivt - l-V'V' 

- " 2-1 ~ ~ „ 2-1 “ “ 

7i » m u mn 

(4) 

" S ~ fill t ~ S fill’ ~ fif I ■“ S S fill' ™ fi 

nr m ^ mn 

Let us compute the mathematical expectations of certain test functions with 
respect to the 2TO7i-dimen8ional distributions 

Vn(a:ii, J/ii) Fufiu , yu) V mnC^mn > y„„). Let 

ElP(xii, yn , • • ■ ^mn ) Vmn) ] 

(5) r r 

^ J * * J > * * * l/mn) ^^1^11(3*11 f l/li) * ' * 2 /mn) 


‘‘ In the particular caeo whore y) hoa every whore a derivative 

QX dy 

d*V , 

two dimenaional density w„,(®, y) - -—™ and the one-dimansiotial densities 

ox ay 

oil’f*) “ J Vyrixy) dy; olJ’fv) - J v^,(x,y)dz 


V 


<11 

)1K 


(i) 



(*) dx. 


ni’ 



oJiKy) dy. 


and we have 



ANAIiTSIS OF VAHTAKCE 


367 


1 

a« = - 
n 

53 I 

V 

dy •— ^ A Xfty f 

m ^ 

Oh — ~ ^ 7 J 7 J Xuy * 

mn 

h -1 

Ofi — “• 

y^y ) 

^ ^ Viiy t 

b = 

n 

» 

'trt ft 

mn 

22(x^y 

-a)\ 

si = nS(fl^ — aYy 

S-m — dp) 


= 22(x^, — — dy-h af', si = m2{a, — a)'*, sj, — 22{x^, — dy) 

e = SS(y,. - b)\ ti = n2(6, - h)\ t = 22{y,y - 6,)“ 

= 22{y^, — h^ — d, + bf, t. = mS(6, - 6)^ = 22{y^, - 6.)^ 


We then have** 

(50 F[Gixn , •■ ■ a;„„)] = f ••• ( G(xii ■ ■ • x„n) dyii’(a;ii) ■ • • dVi^i(x,„„). 

•'(Ill) J(lm„) 

In analogy with previous notations we introduce 

( 6 ) 

and 

(7) 

and 

c = SS( 2 ^, — aXy^., — b), C — SS(x^v - — a. + a)(V;,y -b^-hy + b) 

(8) Ca — n2(af, — a,)(b^ -~b) Cu, = 22{x^y — a^)(yiiy ~ bi^) 

Ca ~ m2(dr — a)(£i, — b) Ca = S2(xpr — dy)iy^y ~ by) 

we then have . . „ „ , 

s* = <S* + 5* + So = si + So = §» + So) 

(9) <* == r* + «o + <^ = <l + <o = ?l + 

C — G ”i” Ca Ca “ Ca “f" Cw Ca H” ^ j 

and corresponding relations for the ranks of these cjuadratic foims. We find 
for the expectations of these teat functions, in analogy with previously investi¬ 
gated formulae: 

E r —1 = — 22rl, + 22((8,, - 0)\ 

\_mn — IJ mn mn — 1 

E r —"—1 = — -H n2[^^ - &)\ 

Lm — ij mn m — I 


and 


[rT^n] ^ ffm 


iS), 


227„, -f 


m — 1 


n2(^<Xn Oc)(,Piiy 


» It may be mentioned that the problem considered in this section of mn bivariate 
distribution y) tonstitutes, of course, only a particular case of dependence (see 

section 2, (e)) for a 2mn dimensional population »(®ii , yn , , Vis, ' ■ ^mn , Wmi.). 





HILDA QEIHINQER 


3fi8 

1) If all (he a^, equal each other, or all the equal each other, we find: 

U'^i] - i,] ■ 

These formulae provide us with unbiased estimates of . 

2) The a^r arc equal within each row but differ from row to row, (Lexis) =; a„ 
9^ a; otf == a whereas the may have arhilrary values, then 

(13) -J - 

The same equalities arc valid for arbitrary if the ~ p. Our 

new equalities may be of some interest because inequalities analogous to those 
of the Lexis case cannot be proved for covariances. If the observed values of 
the expressions in (13) are significantly dififerent we may conclude that neither 
the a,,, nor the form a Lexis series. A judgment of the test (13) might be 
based on the investigation of its power function. But besides we have the 
equalities (12) and analogous equalities containing ll, T’ and I* . 

3) If either a„, =** a,, a, 9^ a, a„ a, 

or 0,,p =» ^ 1 ., 011 0> 

We have the new equalities 

(U, 

and there are no inequalities analogous to the inequalities (14) of Section 2, and 
(13), (14) of Section 1. 

Most of the investigations of Sections 1 and 2 can be generalized for this two 
dimensional problem. 

BIBLIOGRAPHY 

[1] R. A. Fishbb, SlalUlical Methods for Research Workers, 6th ed., p. 214 fl 

[2] R. A. PisHBrt, “Applications of ‘Student’s’ distributions,’’ Melron, Vol. 6 (1926), 

pp. 90-104. 

[3] H. Geihinoer, “A now explanation of non-normal dispersion in the Lexis theory,’’ 

Economelrica, Vol. 10 U942), pp. 53-60. 

[4] F. R. Helmekt, ZeilB. fUr Math, md Phytik, Vol. 21 (1876), p. 102-218. 

(6] I. 0. Irwin, “Mathematical thoororas involved in the analysis of variance,” Jow. 
Roy. Slat. Soc., Vol 94 (1931), pp. 284-300. 

[6] W. G, Ma.dow, “Limiting distributions of quadratic and bilinear forms,” AnnaL of 

Math. Slat., Vol. 11 (1940), pp. 125-147. 

[7] R. V. Miseb, "Theorie des probabilitea. Fondements et applications,” A«nafe* de 

VTnsUtut Poincare, (1931), pp, 137-190. 



ANALYSIS OF VARIANCE 


369 


[8] H. L. Ribtz, “On the Lexis theory and the analysis of variance,’' Bull Am. Math 

Soc , (1932), pp 731 ff. 

[9] A. A Tschupbow, Skandmavisk Aktuanetidahrifl, Vol. 6 (1918). 

[10] A Wald, Lectures on the Analysts of Variance and Covariance, Columbia University, 

1941 

[11] Milton Friedman, “The use of ranks to avoid the assumption of normality,'’ Jour. 

Amer Stat Assre., Vol. 32 (1937), pp. 67S-701. 




THE ANNALS 
of 

MATHEMATICAL 
STATISTICS 

(FOWDID BT &. O. CABTBS) , 

Thb Official Journal of the Institute 
OF Mathematical Statistics 



Contents 


On the Ratio of the Varianoea of Two Normal Populations.. Hjatfty 

Setting of Toleranoe Limits T^en the Sample is Large- Abra¬ 
ham Wald, ,. .3gg 

Statistical Prediction with Special Reference to the Problem oif 
Tolerance Limits. S. S. Wiles ^, 400 

Q^wnlised Poisson Distribution. FRANfcLiNE. SA'rTEKrHWAiTiB., , 410 

* X*- t BBjfKyR„MAim 418 

A Method of Determining ExpEcitly the OoeflBcientS of the Char- 
aoteristic Equation, P, 4.. Samtjelson. .... 434 


Notes; 

A Note on the Theory of Moment Generating, Functions, 3. H,. 

Omrtiss.... ... 430 

On &e Power Funotion of the .Analytis of Variance Text, AbsasaH 

Wald .. .... ... 434 

A Note on. the Betimation of Some Mean Values for a Blyariate Dls- ’ 

tobution. Edwabd PAtmaoH. .... 440' 

'.Signinoanoe Levela for the Ratio of the Mean 'Square Succesaive Dif*' 

ferenoe to the Variance, B,I, HAwr.,.'..,, 5 ,.,,, .446 

AGorrection. 3^4. A. QinsHwa .447 

Rngort of the Poiiighkeepsie Meeting.Aaa 

AbstiiMs' of Papers..... 















ON THE RATIO OF THE VARIANCES OF TWO NORMAL POPULATIONS 

By Henry Schbpf6 
Princeton University 

CONTENTS 

Page 


1 . Introduction and summary . , .. ... 371 

Pari I, Significance teals and confidence intervals based on the F-distribution 

2. The P-distribution , ' , . ... . , ,. .372 

3. Use of one tail . . , . 374 

4. Symmetry condition. . . .. 375 

5. Logarithmically shortest confidence intervals ... ... 375 

6 . Reciprocal limits . , . . , 376 

7. The likelihood ratio. . . 377 

8 . Equal tails . . . . . , . 377 

9. Comparison of the tests and confidence intervals 378 

Part IJ, Stgnijicance teats and confidence intervals baaed on any similar regions 

10. Common best critical regions . ... 382 

11. Type Bi region. . , ... , . 383 

12. Neyman’s categories of confidence intervals ... . . , . 386 


1. Introduction and summary. Suppose that we have two samples Ei and 
El from normal populations tti and iri with unknown means and variances. 
Let us designate by 6 the ratio of the variance of rri to that of The two 
problems discussed in this paper are to formulate in terms of Ei and Ei , and to 
compare, 

(t) significance tests for the hypothesis that the unknown ratio 6 is equal to a given 
positive number Oq , and 

(ii) confidence intervals for 6. 

Since, on the one hand, these problems are of considerable importance to the 
practical statistician and the teacher of statistics, and on the other, they cry 
for the application of recently developed theory which is unfortunately not yet 
familiar to many practical workers and teachers, the development has been 
divided into two parts: Part I, it is hoped, will be intelligible to the above class 
of readers; part II, slanted toward a smaller circle, is more esoteric, general, and 
condensed. 

More specifically, in part I it is pointed out that any choice of limits on the 
JJ’-distribution satisfying the condition that the sum of the areas in the tails 
be equal to a prescribed number, leads to solutions of problems (f) and (it). 
After considering and then ruling out the “one-sided” situations in which it is 
appropriate to use only one tail, two conditions are proposed (ad hoc and on an 
intuitive basis) for the “two-sided” case,—a symmetry condition, and a condi¬ 
tion for logarithmically shortest confidence intervals. The second condition 
leads to a choice of limits on the ^'-distribution. From other considerations, 

371 




372 


HENRY MrHEFKfi 


rcHprwal limitH, likeliliocKl ratio, anti ffiual tails, othor rhoircH are advanced. 
It is found that all four f)f thef^e choices .satisfy the fir.st condition, and that 
furthermore if == , where A', is the number of variate.s m i?, , then the 

four choices become identical. If A'l 5 ^ A'j which of the four tests is “best”? 
W'hich of the four sets of confidenet‘ intervals? For defining and answering the 
fir.st (luestkm in a logically .satisfactory way ju.st a little of the Xeyman-l’earson 
theory of testing hypothtws aufficc.s. For the second, Nt'yrnan’s theory of 
confidence intervals is ealled for, and because of its greater difficulty, this has 
been relegated to part II. However, the limit-s determined by the criterion 
that the test be unbia-stnl turn out to be the same us those whieh yield optimum 
confidence intcn'als from the elementary viewiioint of §5. Their numerical 
values are unfortunately laborious to calculate accurately if Xi ^ Ni, and part 
I concludes with some numerical evidence indicating the lo.ss of efficiency in 
using instead the easily found ‘'efpial tails” limits. For N'l and Nt ^ 10 this 
loss is seen to be quite small. It will iHTlmps hear repeating that if Mi = N 2 , 
the "equal tails” limits on the /'’-distribution are the same a.s those associated 
with the unbiased test and that hence in tliis ease all the advantagi'.s uncovered 
in parts I and II for tlie unbiased test and the related eonfidenee intervals are 
obtained by using the easily available “equal tails” limits. 

In part II we drop the re.strietion that the tests he based on a one or two-tailed 
use of the /’-distribution, Hy a slight extension of results of Neyman and 
Pearson, common best critical regions for testing the, hyiKithesis 6 ^ 60 against 
alternatives d < 60 , or 0 > 6 a, are. found. Hince the rc'gions are always distinct 
for these two “one-sided” cases, there is no uniformly most powerful test, In 
order to find the moat efficient unbiased test some recently published theorems 
of the writer are applied to jirove that the critical region of the unbiased test 
proposed in part I is of tyjie Bi, 

The fact that the results summarized in the aho^'■e paragraph are obtained 
for arbitrary positive 6 a will immediately suggest to the. reader familiar with 
Neyraan’s theory of confidence intervals that it may be easy on the basis of 
those results to draw' conclusions about the existence of Neyman's various cate¬ 
gories of confidence intervals. It is. In particular we find that the set of 
confidence intervals arrived at in §6 constitutes Neyraan’s short unbiased set. 

The writer is aware that not all the results of this paper are now, and hopes 
he has given credit where it is due, hut believes it desirable to bring together all 
the results, old and new, in this attempt to clean up the problems (/) and (it). 
He is pleased to acknowledge his debt to Air. David Votaw for aiding in the 
calculations for fig, I and for finding the formulas (0). 

Part I. Significance Tests and Confidence Intervals Based on the 

F-Distribution 

2 , The F-distribution The sample Ei : (xa , Xn, ‘ i = 1 , 2, is 

assumed to be from a normal population tcj with mean a* and variance tr’ . We 



RATIO OF VARIANCES 


373 


write 9 - < 7 ?/ al, and might regard the statistic T as an estimate^ of 6, where 
T = sVs 2 and 


s? = E (Xii - x^Y/n ,, = £ xJN,, n. = AT. - 1. 

1-1 j».i 

It will be convenient to consider 9, cj, cti, oj as the population parameters, arl 
being eliminated from the joint p.d.f. (probability density function.) of Ei 
and Ei by the substitution al = 6 <tI . For any given positive number da we 
define the composite hypothesis 

Ho : 6 = 9o, 0<o' 2<4-«>, —«<ai<+oo, —oo<(i2<-)-oo. 

In Hotelling’s apt terminology the last three parameters are nuisance parameters, 
It is well known that Ui and f/j, where U, = n.sJ/o-J, are independently 
distributed accordmg to x "laws with tii and degrees of freedom respectively, 
and that hence the quotient P = (Z7i/ni) 4- (Ut/rio) = T/B has the F-distribu- 
tion Art,n,(F) dP with ni and nj degrees of freedom, where 


An,nj(w) 


B(ini, ^rut) 


u 


,l»n—1 



\-i(ni+n,) 
— U ) ’ 

na / 


0 ^ u £ <=o. 


For later reference we note that if We define the variable x from 


( 1 ) 




n* X 
Till — x’ 


then the cumulative distribution function of x is the incomplete Beta function® 

lAini, inj). 

Let a be any number such that 0 < a < 1 (a will be the significance level 
for (i); 1 — a, the confidence coefficient for («)), The symbols , B„,„, 
will always denote a pair of numbers for which* 


( 2 ) 



finjni(u) du ™ 1 


a. 


Every choice of the pair A , B leads to a solution of problems (i) and (it): 

(t). A teat of Ho at significance level a consists of rejecting Hoif T < An^iBo or 
T > • 

The probability of rejecting Ho if it is true is 


1 - Pr(Aeo ^ T ^ Beo\9o) = 1 - Pr(A < T/9o <B\6o) = a, 


independently of the true values of the nuisance parameters. 


> Biased. 

* All the results of this paper pertaining to the F-distribution could of course be stated 
in terms of Fisher’s 2 -distribution [2] or the incomplete Beta distribution, the first is used 
here because of its popularity in applied statistics, and because it permits the simplest 
statements for solutions of problems (%) and (n). 

’ Superscripts on A, B will signify that a further condition has been laid on the pair 
A, B The subscripts will be dropped when there is no danger of confusion. We permit 
B — 00 as a possible choice. 



374 


HENKY HCKKPF^ 


(it). A sel of confidtwo inirri'ol^/or 0 with cimfulrnct rorffidmt 1 « zs* 

g S 77.in„. . 

The probability that the trU(* I’aluc of 6 will Ih' covered by the above random 
interv^al ia 

pT{TfB ^ fl g 77d i fl) » Vr{A g Tie g /f M) - 1 - 

whatever be the true vatuea of e and the nuisance parainettWH, 

It will be convenient to adopt a brief notation for the testfi and confidence 
intan’als determined l)y certain choices of the limits A, B. In the sequel we 
shall denote these choices by A I,,,,, , where i = I, II, ■ ■ • , VI. We 

shall call the significance tc.st baaed on the pair A', B' the test f, and the set of 
confidence intervals based on this pair, the net i of cmifidence intervals, or some¬ 
times more briefly, the cmfidence, iniervals t. 


3. Use of one tail. Suppose a situation in which we do not mind accepting 
Ih if the true value of 0 exceeds flo, but we desire a Lest which is as sensitive as 
possible in rciccting fh when 6 < Be. It can he shown (for ?is > 2) that the 
expected value of T is &{T) = niB/int — 2). and lienee when the true value of B 
is small compared with Ba , so is <?(?). By the usual intuitive eonside-rations we 
are led to rejecting Ih if F » T/Bo falls in the left tail of the F-distrilnition. To 
make the significance level equal to a wc take the limits A, B so that 





B 


i 


CO, 


Similarly, to test Ih against alternatives B > Ba we define test II by 

A“ = 0, / du = «. 

« 

Why test I is best for testing Hq against alternatives B < Bo , and test II for 
B > Bo, will be explained more convincingly in §1). 

The confidence intervals I and II are then semi-infinite. It is apparent that 
if we are not loath to accept large values of B hut wish to exclude the largest 
possible interval of small values (0, T/B), wc should use the set II. Indeed, 
the set 11 is optimum in the case, where we are willing to accept Values of B larger 
than the true value but desire the highest possible probability of excluding any 
values less than the, true value; however, the, precise formulation and proof 
of this statement must he postponed to part 11. Analogous remarks apply to 
the set I and a willingness to accept values of B less than the true value. 

For a = .05 or ,01 the values of J5",, are given in Hnedecor’s F-tables [12; 


< If B » o) wo omit the equality sign to the left of 9, if ^ ™ 0, tlie equality sign to the 
right of 0. 



RATIO OF VARIANCES 


375 


same n.i , 112 as ours], and the values of may be calculated from the same 
tables by using the relation 

( 3 ) Al.r,, •= . 

for a = 50, ,25, .10, 025, .005 may be obtained by use of the transforma¬ 
tion (1) and Thompson’s new tables [13] of percentage points for the incomplete 
Beta distribution. for these values of a can then be found from (3). 

4. Symmetry condition. We now restrict our attention (until §9) to the 
“two-sided” situation in which we are interested in all alternatives to 6 = do 
on the range 0 < 0 < <» Let us contemplate the following si/mmetry condition: 

(^) = l/finjm 

for all positive integers Ui, no . The desirability of this condition and that of 
§5 follows not from mathematical principles but from practical considerations 
which might be relevant whenever significance tests or confidence intervals are 
considered for a parameter d which is the quotient of two other positive param¬ 
eters 01 and 02 , and the estimate of 0 is the quotient of the estimates of 0i and 02 . 

Suppose that given the samples Ei and , computer (?i labels them 1, 2, 
the same way we have, and using our test of §2, rejects the hypothesis that 
a\j<j\ = fc unless 

^ Sx/sz ^ 

while computer labels them 2,1, and following a similar rule rejects al/al = l/k 
(in our notation) unless 

An.njk s 82 / 5 ? ^ 

It will be seen that (4) is merely the condition that they reach the same con¬ 
clusion. This makes life simpler, at least for computers and consulting statisti¬ 
cians Likewise, if <?i and C 2 use the confidence intervals of §2, then they will 
make numerically equivalent statements about vi/o-l and (n/a-x if (4) is satisfied. 

6 . Logarithmically shortest confidence intervals. The length of the confi¬ 
dence intervals of §2 is L = T(A"' - 5'^). We might consider choosing A, B 
in such a way that <S(L) is minimum. This leads to the problem of minimizing 
subject to (2). It might seem just as desirable, however, to minimize 
the expected length of the confidence interval for 0*, 

(r/J5)* ^ g {T/A)\ 

This leads to a different problem with a different solution 

The condition on confidence intervals for 0 which appears intuitively'desirable 
to the writer, is that the limits 0, 0 of the confidence interval 6(Ex, Es) £ 6 ^ 
d(Ei, Ex) be such that S(log 0 — log 0) is minimum. For the confidence inter- 



37(i HKNRY HOJlBirFft 

valK of §2 tliw is ocjuivalciit to minimizing H(A, and by using tho method of 
Lagrange's multipliers we. etisily tint! that 

(5) “ 0 

and (2) must be satislied. Denote the solution* by . It is evident 

that the same condition (5) is obtained if we ask for logarithmically shortest 
confidence intervals (based on the /'’-distribution) for ^ wht're k > 0. 

The numerieal values of the limits d*", are difficult to calculate if ni nj. 
The beat proc,edure seems to be to transform to the. incomplete Beta distribution 
by means of (1) and to calculate the corresponding points aVyn,, from the 
equations 

(G) UxCini, 1, =» 1 — a. 

The points a, b can be found to two decimals by inspection of Pearson’s tables 
[9]. Unfortunately, in the many eases where a is close to 0, or b to 1, 
are then subject to enormous error when calculated from (1). 

6. Reciprocal limits. While tho problems (i) and (ii) are closely related, the 
last choice of limits was suggested solely by our consideration of (ii). Later 
wo will reconsider this choice from tho standpoint of (f), --the reader may 
anticipate that it will again be found advantageous in some respect. For the 
present, we proceed to three further choices, these, arising from various ap¬ 
proaches to (t)- 

Tho proo.tidure recommended in several statistics manuals (see §8) for testing 
tho hypothesis » 1 is to refer the quotient of the larger of «*, s| by the smaller 
to tables. This suggests tho introduction of a statistic M defined os tho maxi¬ 
mum of T, Its distribution* under the hypothesis 6 « 1 is cosily found: 
I.s5t ghiBjCAf) be its p.d.f. Then for 1 jg u ^ », 

( 7 «,Ft,(u) du =* Pr(u < M < u A- du \ Q 1) 

~ Pr(w <jr<n-f-duoru< T~^ < « + du) 

= Pr(u < r < u -p du) -f Pr(u < T”* < u + du), 

since the last two terms are the probabilities of mutually exclusive events. 
Furthermore, the first term is h„,„,(u) du, and because of tho symmetry induced 
by =»» 1 we can evaluate the second term by merely interchanging subscripts. 
Hence the desired distribution is 

brn,n%(u) b-njni(u), 

regardless of the true values of tho nuisance parameters, 


‘ It can be shown by elementary methods tjiat the solution of these equations exists and 
is unique; likewise for the solutions Inter denoted by superscripts IV and V. 

• Considered by K. Pearson [8]. 



RATIO OF VARIANCES 


377 


If we reject the hypothesis d = l\iM > , where 

Qnintiu) du = a, 

‘ 1**2 

then this significance test is easily shown to be the same as that of §2 with 
do = I and 

■d-njn2 “ * 

We remark that again these limits are not easy to compute if ni tit. 
While this choice of A, B, which we shall call , has been motivated 

only for the case do = 1, it leads of course to a test IV for any do and a set IV 
of confidence intervals. 



7. The likelihood ratio. Since the properties of X-criteiia in general have 
received much attention in the literature, and since in particular the \-test for 
Ho is equivalent to a certain choice of A, B, we shall mention it here, and see 
whether it has any advantages in §9. X for Ho in the case fio = 1 was given by 
Pearson and Neyman [ 7 ; their Hi ,n{, si, d, Xwi are our Ho, Nr, sliN, - l)/iV,, 
Ni(N 2 — f)/{NiiNi — l)!r),X]; for any 00 it may be shown to be 

\ = Crr,n,F^'^(l+^F^ 

On considering the (bell-shaped) graph of X against F we see that X < Xo cor¬ 
responds to two intervals, say 0 ^ F < F' and F" < F 'S co. The X-test, 
which consists of rejecting Ho when X < Xo, where Xo is determined so that the 
significance level is a, is thus equivalent to test V with An^n^ , Buim satisfying 
(2) and 

8 . Equal tails. Perhaps the most venerable procedure for determining limits 
on a distribution for a significance test in a “two-sided” case is to choose them 
so that the tails of the distribution have equal areas. Define AiJnj, from 

' hn^niiv) du = I flniTij (u) du = 

The values of for a ^ .10 and .02 are given in the P-tables [12; same 
n^ , na as ours] as 5% and 1% points. The relation 

f7} "ninj-Onjni 

is easy to get, and hence Al\„, for these values of a may also be calculated from 
the fi’-tables. The limits for = .25, .10, .025, .005 can be calculated from 
(1), (7), and Thompson’s tables [13] 



378 


HENRY SCHEFEjfe 


Sinc'i' test VI will later be wh;!! to have some merit we will clisctifw it somewhat 
farther at this point. In .several .statlsties texts [e.g., 3, 14] the .student is told 
to take the quotient of the larger by the smaller of hI , Sj , refer it to the F-table, 
taking the nj of the table to be the w, of tJie numerator, and to rcije.ct the null 
hypothc.sis 0 == 1 if the .sjunple value is larger than the tabulated. It i.s then 
further stated without proof that in using the 5% or 1% points of the /'’-table, 
the significance level is actually 10% or 2%. Mince thf« tpiotient thus nderred 
to the. table is precisely the stati.stie Af of §(>, it wouhl seiuu logical to refer it 
to an d/-table rather than the /’-table! However, the above procedure can be 
iuatificsl’ as follows; The equation (7) tells us that test VI fulfills the symmetry 
condition (4). It makes no difference then in his conclusions whether the 
computer uses the statistic Si/ss and the di.stribution or s?/si and 

In particular he may always use the larger ratio and /imn(F), where 
m and n are the "degrees of freedom” of numerator and denominator, respec¬ 
tively. Since this statistic cannot fall in the lower tail, he need consider only 
whether the. calculated value exceeds the tabulat'd. But in using the, value 
Uitmlatcd aa the upper p% point of the F-dislrihuUont he makes his teat at the 2p% 
significance level, 

9. Comparison of the tests and confidence intervals. We now have at hand 
two one-tailed and four two-tailed tests, and corresponding sets of confidence 
intervals, all based on the. /’-distribution. We note at this point that all four 
of the two-tailed teats satisfy the, symmetry condition (4), and that in the special 
case Til = rij ^ these four testa become identical. In comparing any two tests, 
an instrument which makes their relative advantages eomplefely anschaulich 
is the power curve (surface in a more complicated cose). Tlic definition and 
interpretation of the power curve of a test are based on the insight of Neyman 
and Pearson [5] that two types of error are possible in applying a test; Wo 
may (I) reject the hypothesis when it is tnie, or (II) accept it when it is false. 

We see immediately that for any test of the class considered in §2, the prob¬ 
ability of a type I error is the same, namely a. To find the probability of a 
type 11 error, let us introduce a little more terminology; We denote by E the 
sample point (Ei , Et) and by w the region of sample space defined by 

(8) T < A6ii and T > B9o. 

w is called the critical region of the test; the test rejects //o if and only if E falls 
in to. The probability of this, which is called tlie power of the test, is 

1 - FriAdo/e ^ T/e ^ BQo/e I 0, (Ta, a, , oj). 

Since in the present cose this happens to be completely independent of the true 
values of the nuisance parameters, even for 0 $<>, let u.s write it as P(w | 0). 

Then 

’ The writer is indebted to Mr T. W. Anderson, Jr. for pointing out to him that it is not 
necessary to use the Af-distribution. 



RATIO OF VARIANCES 


379 


(9) P{w I 9) = 1 — / ' hn^n^iu) du. 

JMolt 

Finally, by the power curve of the test we mean simply the graph of the power 
P{w\ e) as a function of 6 

We may now state the probability of a type II error; it is 1 — P('U) 1 6), where 
necessarily B da Hence the ordinate on the power curve for 6 dais the 
probability of avoiding a type II error, while for 0 = 00 it is the probability of 
makmg a type I error. By inspection of equation (9) we find that, barring the 
cases B = oo or A = 0 (tests I and II), P{w | 0) ^ 1 as 0 —> 0 or ». We cal¬ 
culate the derivative to be 

(10) P'{w 1 0) = [uh„,n,{u)/drjll;s , 


PMe) 



Fig 1 


which is obviously continuous for 0 < 0 < . If we equate this to zero we find 

a unique solution for 0, and hence the power curve has a single minimum point. 
In the exceptional case B = ^ we see from (9) that P{w | 0) decreases mono- 
tonically from 1 to 0 as 0 increases from 0 to <» ; in the case A = 0, P(w \ 9) 
increases monotonically from 0 to 1. Some power curves* are plotted in fig. 1. 

Always understanding by w a region of the set defined by (8), and recalling 
the above interpretation of the ordinate on the power curve, we are led to ask 
whether there is not a w, say Wa, whose power curve nowhere drops below any 
other curve P = P(w | 9). (They all pass through (0o, a).) The test based 
on such a region Wo would be called uniformly most powerful (UMP) of the class 
considered, and obviously would be preferred under any circumstances. Alas, 

* Power curves for test V may be found in a paper by Brown [1]. It did not seem worth¬ 
while to construct curves for test IV, since the limits are hard to compute, the test is biased, 
and has little historical interest 



380 


HENRY BCHEFE^i 


it does not exist Perliaps some insight into the fact of the general non-existence 
of UMP tests can bo gained by returning to fig. 1. While fig. 1 is for the case 
rii *= 10, ns = 20, and a — .05, the following remarks are valid for any ni, ns, a\ 
We. note that for testing Ih against alternatives 6 < 80 test I is far superior to the 
other three, indeerl it is superior to any of the tests of the class defined by (8) 
in the sense that its power curve lies above that of any of the, other tests.” But 
for alternatives 8 > 8 a, test I is seen to be very poor (the worst possible, it can 
be shown). Similar remarks apply to test II and the complementary alterna¬ 
tives. This constitutes the more convincing explanation promised m §2 of the 
superiority of teste I and 11 in the “one-sided'’ coses. Since the power curve 
of test I lies above nil other power curves for 9 < do, and that of test II above 
all for > So 1 it is now clear tliat there is no UMP test of the class considered. 

To cope with the commonly occurring situation where there is no tIMP test, 
Neynnan and Pearson [5] defined on unbiased test,—one whose power curve has 
an absolute minimum at flo • The desirability of an unbiased test in the “two- 
sided’’ case is evident when we note that if a test is biased, the probability that 
we accept the hypothesis fl »= ^0 is greater if 8 has certain values 8 80 than if 

s= 5 q . To find which, if any, of our teste is unbiased, we equate expression 
(10) to zero for fl = flo • Asa result we find’® the condition (5) which determines 
test III. 

We see now that the limits J?*" yield the preferred test in the “two-sided” 
case, as ^ell as the logarithmically shortest confidence intervals. However, as 
pointed out in §6, the numerical values of these limits arc difficult to calculate, 
and the question then arises, do we lose much by using instead the easily ob¬ 
tained "equal tails” limits B'”7 In the case ni >= 10, 7i« = 20, a = .05, 
fig. 1 shows that the power curves of teste III and VI differ very little. The 
extent of the bias of test VI for other values of tii, n», and <x — .05, .01 is in¬ 
dicated in table I. (The missing diagonal entries are all 1,5 or 1,1). Let 
us call the entries /9, 100 a, where /3 = , a = P(Ti)'''' | Smin). From (10) 

and (1) we get the following formula for computing fi-. 

|3 - (SB -aQ">«"‘+”*’)/(Q - 1), 

where 

Q == (S = 0/(1 — a), fB = 5/(1 — h), 

and o and 1 — 6 are the 100(iia)% points on the incomplete Beta distribution 
for ra “= nj, vi = ni, and =« ni, xa «® n*, respectively, in the notation of 
Thompson’s tables [13]. a may then be computed by transforming (9), 

r -1(1 + /3/SB)"’ 

a *= 1 - /,(inx, Jns) , 

_ ^ -J®«(l-f 

• The reader may prove this from (9) or note that it ie a special cose of the 
results of $10. 

“The equivalent condition on the incomplete Beta distribution was given by Pitman 
110 ] for the case So ■■ 1. 



TABLE I 

Minimum points of power curves of lest VI 
The eatries are flp>in/flo , lC)OP(io'^ 1 9min), 


Roman type for a = 05, bold face for a = .01 


''^'•^712 

1 

2 

3 

5 

10 

20 

40 

00 



634, 

576, 

.659, 

mi 

■ai 

581, 

588, 



4,76 

4 47 

4.17 

Eli 

iBI 

3 63 

3.61 

1 


.631, 

.677, 

.671, 

.696, 

.617, 

.630, 

.646, 



.946 

.883 

.808 

.740 

.706 

.687 

.670 


1.578, 


861, 

.779, 

.745, 

737, 

1^9 



4 75 


4.93 

4.69 

4.44 

4 26 



2 







^H|H 



1.685, 


.866, 

.776, 

.749, 

.749, 




.946 


.982 

.928 

.863 

.804 





1 161, 


.895, 

.838, 

.819, 

.812, 

.808, 



4.93 


4 92 

4.70 

4 51 

4.41 

4.29 

3 

1.734, 

1.170, 


.889, 


.821, 

.819, 

.820, 


.883 

.982 

1 

.978 

.917 

.867 

.837 

.804 


1.789, 

1 284, 

1.117, 


927, 

.898, 

886 , 

00 


4 17 

4 69 

4 92 


4.92 

4.78 

1 

4.67 

4.64 

5 

1.762, 

1.289, 

1.124, 


.924, 

1 .896, 

.887, 

.882, 


.808 

.928 

.978 




.903 

I .864 


1.771. 

1.342, 

1.194, 

1.079, 



.049, 

.941, 



4 44 


4.92 



4.80 

4.76 

10 






HM 




1.682, 

1.386, 

1.198, 

1.083, 



.049, 

.937, 


.740 

.863 

.917 

.976 


^^1 

.964 

.920 


1.742, 

1.367, 

1 .221, 

1 114, 

1.036, 


.083, 

.967, 


3.76 

4.26 

4 51 

4.78 

4.96 


4.98 

4.88 

20 










1.622, 

1.336, 

1.217, 

1.116, 

1.038, 


.983, 

.968, 


.706 

.804 

.867 

Hgl 

.987 


.993 

.960 


1.722, 

1.360, 

1 231, 

1.129, 

1 063, 

1.017, 


.984, 


.3.68 

4.15 

4.41 

4,67 

4.80 

4.98 


4.04 

40 

1.687, 

1.827, 

1.221, 

1.127, 


1.018, 


ftfti 
* VwV ^ 


.687 

.778 

.837 


IkJH 

.993 


.980 


1 700, 

1.360, 

1.238, 

1.140, 

ikiTBi 


ma 



3.61 

4 05 

4.29 

4.54 

4.76 

4.88 

4.94 


oo 

1.649, 

1.316, 

1.219, 

1.134, 

1.067, 


1.017, 



.670 

.761 

.804 

.864 

.926 

1 .960 

.980 



381 















































m 


HENRY flCHKPF:^ 


iwd using Pfarson’s tables [9], or, when r is very clone to 0 or 1, using a few 
tenns of the sf'rics 


^n) =» 1 ~ 7i„^(§n, 




"2 _ 
_m 


n —^ 2 8 

;y(Vr+~^ f! 


H(^m, in) 

(n ~ 2)(n - 4) «* _ (n ~ 2)in - l)(nj- (5) ^ 
2‘(»i + 4) 2~! 2^(m + 0) 3! 


+ 



In computing di «it i« iK’rhaps simplest to take nj > ua and use the relationships 


drti«i ^ l/dnini* nnjHj — Otnjnj . 

When sample sizes ni + 1, ni + 1 are such that table, I indicates a large bias’ 
it might be worthwhile to get limits for an unbiased test from the “equal tails’, 
limits aa follows; The limits A”', 5“' for an unbiased test HI may be obtained 
by taking 

Jm ^vi/y3^ 

l>ut the teat mil then be at significance level a. The gain in using 5“^ instead 
of A’^'S is more apparent when we consider confidence intervals: The sets 
associated with A'“, and A’'’^ have the. same logarithmic lengths, but 
the confidence coefficients arc 1 — a and 1 — a, resjwctively. 

This seems to be about as far as it is worthwhile to carry the developments 
at the elementary level of part I. Some inadequacies may already have disturbed 
the reader: Wliy not consider in place of the interval (A, B) on the range of F 
any rneasurabk region^ B such that the integral of h„,„,(F) over i? is 1 — a? 
Under the transformation T » O^F the complement of Ji, just as the complement 
of (A, B), would lead to critical regions iv for which P(u> | ^o) = a for all values 
of the nuisance parameters. Critical regions satisfying the last condition are 
said to be similar to the sample space with regard to the nuisance parameters. 
More generally, how would our preferred test I, II, III stand up if we admit 
for comparison, tests baaed on any similar regions whatever? Finally, how 
can one formulate in a general way conditions for optimum confidence intervals, 
and would a more general formulation still lead to the preference of the sets 
I, II, III? Answers to these questions will be found in part II. 


Part II. Significance Tests and Confidence Intervals Based on any 

Similar Regions 

10. Common best critical regions. For the case flo » 1, Neyraan and Pearson 
[6] have shown that the critical region of test I is the common best critical 
(CBC) region for testing Ha against alternatives B < $a. This result is easily 
extended to any Bo by a simple device. We consider the following 1:1 trans¬ 
formations of variables and parameters: 


'I Our intuitions may balk at the notion of using sets R mote general than intervala, but 
it would nevertheless be reassuring to find that our tests can meet this competition. 



RATIO OF VARIANCES 


383 


( 11 ) xu = Q\xi'i, .r2ii = 3:u, J == 1,2, ■■■ ,Ni;k = 1,2, ,]V2, 

(12) 6 = 6oG', crl = (crif, at ~ Botti, as = Oi. 

Denote by , Es, E' the points corresponding to Ei, Es, E, respectively, 
under the transformation (11), by any point m the space of the three nuisance 
parameters, and by its correspondent under the transformation (12), by 
H's the transformed hypothesis, H's. 6' = 1, i>', unspecified. If w is any Borel- 
measurable region of the space of B, and w' the map of w under (11), then 
Pr(E ew \ 6,d-) = Pr{E' e w' | 6', ??'), which we shall write as 

(13) P(io 1 e, d) = PW I E, ilO. 

We note that the coordinates of E[ are normally distributed with mean a', 
and variance (a'S where (tr()^ = d'{ffs)^, all JVi + Ns coordinates being 
statistically independent. Designating the critical region of test I by Wo, 
and its map under (11) by ujq , the result of Neyman and Pearson may then be 
stated as follows. Wo is a CBC region for H'o and alternatives B' < 1. Now 
suppose Wa were not a CBC region for Ho and alternatives 9 < Bo. Then there 
would exist a region wi , a value 6i < Bo, and a point t?i such that P(_Wi \ Bx, t?i) > 
P{wa I fli, ill), while P(wi 1 flo , il) = ct for all il. Let w[ , 9[, ili correspond to 
wi, Bi, t?i under (11) and (12). Then from (13) we would have that 
P{w[ 1 Bi, i?() > P(it)o I d[, tli), where 8i < 1, while P(,w[ \ 1, tl') = a for all tl'. 
But this would contradict the fact that wo is a CBC region for Eo and alternatives 
6' < 1. 

The proof that the critical region of test II is a CBC region for testing Ho 
against alternatives 0 > 0o is of course completely analogous. This establishes 
the non-existence of a UMP test for Hq , and so we consider next the existence 
of a "best” unbiased test. 

11. Type Pi region. This section is a direct application of a recent paper 
“On the theory of testing composite hypotheses with one constraint” to which we 
shall refer as [11] Since it is not feasible to restate here the definitions, assump¬ 
tions, and theorems of [11], we shall refer to them by their numbers there It is 
convenient to transform the parameters of the p.d.f. of E by putting 

(14) B = l/if'. Bo = 1/V'o, V2 = 1/h. 

Then 

(15) p(E I h, ax , as) = (27r)-‘V''‘h*'' ‘ 

exp l — — Ox)^ + iSi] + h[Ns(x 2 — Os)* + *52]], 

where 

N = Nx + Ns, 5. = w.s?. 

We note that type B and type Pi regions (definitions 1, 2 in [11]) are invariant 
under certain transformations of parameters; Suppose new parameters 8', tl' 



aM 


HENRY SCHBYFfi 


are intrwluced by 1:1 Iramfnrmations B — d = T^et Bo correspond 

to Be, and consider the transformed hyi)Othc,siH He : B' — Bo ; O', unspecified. 
Sufficient oonditions that a region be of type B for te.sting Ho if it is of type B 
for testing He are that the* function 0{B') have first and second derivatives and 
that tile first not vanish at Bo. The last statement remains true if B is replaced 
by Bt. Since the transformations (14) satisfy the.se .sufficient conditions, we 
define 

Ho : ^ f0; 0' U>, Oil oi), unspecified, 

and propose to show that there exists a tj'iie Bi region for testing Ho , and that 
it is the critical region of test III. 

For later reference we now note that the four functions of variables and 
parameters defined in Table II are mutually independently dustributed as 
indicated there. 


TABLE II 


Function | 

j Dictribution 

£/i if'hSi “> ! 

t/j “ hSi “ 'Sj/a’j 1 

«j >« — fli) ■» iVldi — oi)/<ri 

Ui - (AV,)Kis - Oi) " A’JCJi — Oil/ff-j 

X*, witlv Hi dogrcea of freedom 

(f II II 11 U 

normal, with xoro mean and unit variance 

rr II (1 (1 (I li ri 


Let us first verify the critical assumption 3° of [11]: Identifying our h, Oi, oj 
with Bi, 02, Bt, Bt of [11], we find from (15) that 

4 >i “ HW - mt(^i - a.)’ + -Sill, 

<h » ilN/h - - ai)* + *5i] - - 0,)“ + S,]|, 

<f>t « <phNi(,Si — Oi), 


4n •= ~ ai), 

and then check 3° by differentiating equations (16). 

To verify assumption 4”, let , *3, of [11] be our xu , xn , 2:21, X 22 , 
respectively. We calculate 


a(Xl, Xt, X 3 , Xi) 


^A*(xi - xt){xi - Xi), 


which vanishes only on the same set of probability isero for all admissible values 
of the parameters, The validity of assumption 6° follows from §6 of [11], and 
there is no difficulty in verifying 1° and 2®. 

To apply theorem 1 of [11] we must find functions ki(<f> 2 , <h 1 <t>i > 'f'o, ^'), 
i = 1,2, such that 

f** 

(17) / </){p(<^i, 02, ^ai <^4 1 fo, ^') d0i = (1 ~ a) same, 

•'^1 liL-qD 



RATIO OF VARIANCES 


385 


for i — 0, 1, "where the symbols </), henceforth are understood to stand for the func¬ 
tions (16) with \j/ replaced by 4'a • If the functions fc, exist, then the region in 
sample space defined by 

(18) ft>i < ki and (j>i > fcj 

is independent of and of type B 

From equations (16) and Table II we see that 

~ u-i)/\pQ, <j>2 = |(Af — U2)/h, 

</>3 = {'PohNi)^U3, = (hN2)^Ui, 

where 

Ui = Ui ul, = III + f/j "I" lis + lij) 

and ^ is put equal to in Ui ,U 3 . Furthermore, for fixed lij, Ms, ^4, the range 

of Ml is 

Ms ^ Ml ^ tij — M4 . 


Transforming the integrals in (17) by substituting (19) and 

. .1/ on piUl,U2,U3,Ui\>p2,^') 

d{ui, U 2 , Ms, Ui) d{Ui, Ifj, Us, Ms) 
where the p.d.f. in the numerator is, from Table II, 

exp (-^Ms), 

we get as the equivalent of (17) 

f (iVi — ui)‘(ui — M 3 )*"*“\m 2 — m* — Mi)*"*~‘dui = (1 — a) f same 
Jx, Jo 

Ki{u2 , Ms, M4 ; ^0, *?') = k^{<h , <t >3 ,<!>*', 4 'a, ^')- 


with 


Finally, we let 

(20) X = (Ml — M3)/(M2 — Ms — Ml), 


and get , 

f \Ni - ul - (m2 - Ms - m!)x]'x *"‘“‘(1 - a:)‘"'“^ dx = (1 - a) f same, 

J/ci ■'» 

where k,(u 2 , Ms , mi ; iZ-o , t>') are the values of x obtained by setting Mi equal to 
the function K, in (20). The last condition is equivalent to 

(21) [ x^"‘“^'^‘(l - x)^"’“'dx = (1 - a) f same, t = 0, 1. 



380 


HENRY SCHEFFfi 


Since x is a contimiou.s monotonio function of ()!>i, (18) I)et“omes 

(22) X < Ki and x > «s. 

Solutions for the fuiiftions ki , satisfying (21) exist in the form k, = constant. 
Indeed, if we now note that the x defined by (20) is the .same as that defined in 
(1), and let xi ~ a, = h, we .see that the eondition.s (21) are. identical with 
(0), and that our method of lindiiig type B regions has led u.s to the critical 
region of test III. 

To show that the tyiie B region obtained from Theorem 1 of [11] is also of 
typo Bi, we appeal to Theorem 2: From (15) we have 

p{E I &')/p{E I , d') = imot''' exp {~ mM 1. 

Since for 4' ^ 'I'o this function is convex in ij>i, Theorem 2 is applicable. The 
result of this .section is the conclusion that the critical region of te.st III is of 
type Bi for testing //o. 

12. Neymon’s categories of confidence intervals. The concepts and ter¬ 
minology of this section are those formulated in a basic paper [4,] by Ncyman. 
Suppose a distribution depends on a parainetor d, and on further parameters 
, ^31 • • • , which w’c, shall s 3 'mbolizc by i?. The hj'pothe.sis 

Hido): 6 = $ 0 ] t>, unspecified, 

may be called a composite hypothesis with one con.straint [11]. Let E be the 
sample point, W be the .sample space, and w be any Borel-measurable region in W. 
Write Pr(£ < le I 9, 1 ?) = l’(ie) 0 ,dj. The condition that a critical region ^(eo) 
for testing //(0o) be similar to IF with respect to t? is 

(23) P(io(^o) I Oo, 1 ?} = a lor all i?, 

where a is fixed throughout our discussion {suppose for every admissible Og 
there exists a similar region v)(9o). The complementary region A(do) = 
W — wi6o) we may call a region of acceptance. For any E we next define the 
linear set d(E) of points on the ^-axis as the totality of points 6 such that E tA{d). 
The probability [4] that the random set S{E) cover a value if the true value 
of 0 is 0' is 

(24) Pt{0'> t0{B)\0', ,>1=1- P[w{e")\0', «?!, 
and hence from (23), 

(25) Pr[0' tKE)\0',-»\ * 1 - a 

for all 0\ d, and we might call the aggregate (S(E)I a set of confidence regions 
with confidence coefficient 1 — a. Now if all 6(E) are intervals, then they form 
a set of confidence intervals. 

We have now shown that if f/(0o) is a composite hypothesis with one con¬ 
straint, if for every admissible there exists a similar region w(0o) for testing 



RATIO OF VARIANCES 


387 


Hi9o), and if the aggregate [d{E)} determined by the family {w(flo)} consists of 
intervals S(B), then {fi(B)} is a set of confidence intervals. By similar use of 
(24) the reader may prove that if furthermore each w{Ba) of the family has the 
property P of the table below, then the corresponding set (5(51) j of confidence 
intervals is of Neyman’s category C. 


P property of ufef) 

C: category of l5(£) 1 

gives UMP test 

shortest 

CBC for 9 > 9o (or 9 < 9o) 

best one-sided 

gives unbiased test 

unbiased 

of type B 

short unbiased 

of type Bi 

shortest unbiased 


We have taken the liberty of calling a set of one-sided confidence intervals 
d(E). 9(E) ^ 9 (or e ^ 9(E)), 

where 9(E) and d(E) are Neyman’s umque lower and upper eshmates, respec¬ 
tively, best one-sided, and of calling a set {6 o(j^)} shortest unbiased if for all 9', ^ 
it satisfies (25) and 

(26) [dPr{e' € So(E) 1 9, = 0, 

while for any other set {Si(E)} satisfying (25) and (26), and all 6", O', 

Pr{9" € 5o(E) I 9', S Pr{9"e5i(E) | 9', r?}. 

It follows immediately from this discussion that our sets II and I of con¬ 
fidence intervals are the best one-sided, and that the set III is not only a short, 
but the shortest, unbiased set. 

In conclusion, we remark that Neyman's concept of the “shortness” of a set 
of confidence intervals strikes o ne at first as indirect,—to fully appreciate its 
elegance it is perhaps necessary to attempt the formulation of a general theory 
from a more naive approach,—and that it is then of interest to discover that 
in the present case his short unbiased set coincides with that reached by the 
direct intuitive (but obviously extremely limited) method of §5. 

REFERENCES 

[1] G. W Brown, “On the power of the Li test for equality of several variances,” Annals 

of Math. Slat., Vol. 10 (1939), p 127 

[2] R. A. Fisher, “On a distribution yielding the error function of several well known 

statistics,” Proc. Ini. Math. Congress, Toronto, 19Bi, Vol. 2, p. 808. 

[3] J F Kenney, Mathematics of Statistics, part 2, N. Y., 1939, p, 144. 

[4] J. Neyman, “Outline of a theory of statistical estimation based on the classical theory 

of probability,” Phil Trans Roy Soc London, sei. A, Vol, 236 (1937), 
pp 333-380 

[6] J. Neyman and E S. Pearson, “Contributions to the theory of testing statistical 
hypotheses, part I,” Stat Res. Mem., Vol. 1 (1936), pp. 1-37. 



388 


HKNRY SCHEPF^ 


t6] J. Neyman and E. S. Pearson, "On the problem of the most efficient testa of statistical 
hypotheses,” Phil. Trans. Roy Soc. London, ser A, Vol, 231 (1933), pp. 289-337. 

[7] E. 8. Pearson and J, Neyman, "On the problem of two samples,” Bull. Int. Acad. 

Polon. Sc. Lei., ser. A, 1030, p 82. 

[8] K. Pearson, S. A. Stotjei’Er, and F. N. David, “Further applications in statistics of 

the T„(x) Bessel function,” Bionielrika, Vol. 2-1 (1032), pp. 306, 339, 340. 

19] K. Pearson (Editor), Tobies of the Incomplele Bela Function, Cambridge, 193-1. 

[10] E. J. G. Pitman, "Tests of hypotheses concerning location and scale parameters,” 
Biomelnka, Vol. 31 (1039), p. 207. 

Ill] H. ScHEFPi, "On the theory of testing composite hypotheses with one constraint,” 
Annals of Math. Stai., Vol. 13 (1942), pp, 280-293. 

[12] G. W, Snedecoh, Slaiislical Methods, Ames, 1940, pp. 184-187. 

[13] C. M. TnowpaoN, "Tables of percentage points of the incomplete Beta function,” 

Biomeirika, Vol. 32 (1941), pp, 1681-181. 

‘14] L. H. C. Tippett, The Methods of Statistics, London, 1937, p. 118. 



SETTING OF TOLERANCE LIMITS WHEN THE SAMPLE IS LARGE 

Bt Abraham Wald 
Columbia University 

1. Introduction. Let f(xi , • • • , Xp , , • • • , 0*) be the joint probability 

density function of the variates Xi, • • • , Xp involving k unknown parameters 
9i, ■ ‘ , 6k A sample of size n is drawn from this population. Denote by 
Xtaii — 1) ‘ ' , Pi a = 1, ■ • ■ , n) the flf-th observation On Xj. We will deal here 
with the following two problems of setting tolerance limits, which are of im¬ 
portance in the mass production of a product: 

Problem 1. For any two positive numbers |9 < 1 and y < 1 we have to con¬ 
struct p pairs of functions of the observations L,{xn, , Xp„) and 

I7.(a;ii, • • , Xpn) (z = 1, • ■ • , p) such that 





ii ■■ 

‘1 /(si, •• 

J Li 

• ,Xp,6l,--- ,ek)dXi--' dxp>y\6i 


where for any relation R, P(R | ^i, • • • , ^ib) denotes the probability that R holds, 
calculated under the assumption that fli, • ■ • , Bk are the true values of the parameters. 

Problem 2. For any positive numbers < 1, X < 1 and for any positive integer 
N we have to construct p pairs of functions of the observations L,{xii , • ■ • , Xp„) and 
Ui(xii, • " , Xp„) with the following property; Let y,a(i = 1, ■ • • , p; a = 
1, N) be the a-th observation on the variate x, in a second sample of size N 
dravm from the same population as the first sample has been drawn. Denote by M 
the number of different values of a for which the p inequalities 

L%(Xii j ’ * * j Xpn) ^ y\a ^ "U i(Xii > * * ' ) Xpf) j p)j 

are fulfilled. Then 

(2) P{M > XN i 9i, • • • , 0*) = /3, 

where 6i, ■ ■ ■ , Bk denote the unknown parameter values of the population from 
which the observations x,* and y,a have been drawn. 

The functions L, and U, are called the tolerance limits for the variate x,. 
We will say that L, is the lower, and U, the upper tolerance limit of x,. In 
general, there exist infinitely many tolerance limits Li and U, which are solu¬ 
tions of Problem 1 or Problem 2. It is clear that the toleiance limits L. and 
Ux are the more favorable the smaller the difference C/, — L,. Hence if there 
exist several solutions for the tolerance limits L, and 17, we should select that 
one for which the difference U, — Lx becomes a minimum in some sense. 

S. S. Wilks' gave e. solution of Problems 1 and 2 in the univariate case, i e. 

1 S. S. Wilke, “Determination of sample sizes for setting tolerance limits,” Annals of 
Math. Stai , Vol. 12 (1941) . See also his paper on the same subject presented at the meeting 
of the Institute of Mathematical Statistics in Poughkeepsie, September, 1942. 

389 



ABKAHAM WAIjD 


asK) 

if p == 1. It hcc'inb that Wilks’ bohition is the host possible one if nothing is 
known about the probability density funetion except that it is continuous. 
However, if it i.s known a ])riori that the unknown density funetion is an ele¬ 
ment of a fc-parameler famdj' of fnnetmn.^, it will in general be po.ssdde to derive, 
tolerance limits whieh are considerably better than those propo.sed hj” Wilks. 

Wilk.s’ result.s can cicsily be i‘Xteudi‘d to fin* multivariate: ease, provided the 
variate.s a-i, • ■ ■ , ;tp are known to la* independently (li.strihutcd “ Thi.s is a 
HPi’iouK restriction, .since in many practical ease.s the independence of the variates 
Xi, ■ • • y Xp cannot he assumed. The case of dependent variates has not been 
treated by Wilks. 

In this paper we give a solution of problems 1 and 2 when the size n of the 
sample is large. In the next .section a lemma i.s jirovcd which will he used in 
the derivation of tolerance limits. In st'ction 3 the univariate case is treated 
and in .section t the re.sults art' extended to the imdtivariate case. 


2. A lemma. We will prove the following 

Lemm\. Let {.tinl, ■ ■ ■ , (jrnl (ii = 1,2, • • ■ , atl iiif) he. r sequenoeii of random 
variables and Icl ai y Ur be r consla7its such that the joint distribution of 
•\/a(iu — tti), • ■ • , V^aCxrn ~ ftr) converges loith n —+ w towards the r-variaie 
normal disirihuHon with zero means and finite non-singular covnriniicc matrix 
II O'!! 11 ih j - 1) ■ • • ( Furthermorey Ul g{ui , • > • , Ur) he a function of r 
variables Wi, • • • , a, which admits continuous first derivatives in the neighborhood 
of the point Ui = ai, • • • , n, = Or. Assume that at least one of the first partial 
derivatives of g{u \, • • • , Ur) is not zero at the point Ui ~ ai, ■ • • , Ur == Up. Then 
the distribution of VnfffC-Cin , • • • , Xr«) -• g{ai , • • • , Op)] converges with n —> co 

towards the normal disirihuHon mlh zero mean and variance <r° = 2:2 

) > 

where Qi denotes the partial derivative of g{ui , ' • • , uf) with respect to u, taken at 

Wi ~ ^ ( "Ur ^ (^r * 

Proof: Since the joint distribution of -\/n(.rin — m), , 's/nCxm — a,) 

approaches an r-variate noniial di.stribulion with zero means and finite non- 
singular covariance matrix, the probability that 


(3) 


a. - 




^ ^ a + 


1 


(i = 1, • • ■ , r) 


holds, converges to 1 with n ~> = 0 . From (3) and the continuity of the first 
derivatives of g(ui, ■ • ■ , Up) it follows easily that for any positive e the prob¬ 
ability that 


(4) 


X) Vn (a:i„ - ai)g{ - < 

I"*! 


5 VMgixiyv, ■ ■ • ,x,„) - g(oi, • • •, Of)] < 22 Vn ix,n - aOgt +'e 

t 


“ This was mentioned by Wilks in his paper presented at the meeting of the Institute of 
Mathematical Statistics in Poughkeepsie, N> Y., September, 1942. 



SETTING OF TOLERANCE LIMITS 391 

MdH converges to 1 with « Since the limit distribution of 

2^ Vn(T,„ a.)£f, IS normal with zero mean and variance equal to 22o-„-g.g, , 

our Lemma follows easily from the fact that the quantity * in (4) can be chosen 
arbitrarily small 


3. The univariate case. In this section we assume that p = 1 Hence the 
probability density function f(x^, ■ ■ ■ , x, , e,, ■ ■ ■ , e,) is replaced by the uni¬ 
variate density function fix, 6i, • • - , In order to simplify the notations, 
the letter 0 without any subscript will be used to denote the set of parameter 
values 01 , • ■ ■ , 8k ■ 

For any positive S < 1 let (p(d, and ^(9, f) be two functions of 8 such that 

rH^i 0 

(5) / fix, 0) dx = 

•'»>(». t) 


If fix, 0) is a continuous function of a:, functions v(0, and ^(0, f) satisfying (5) 
exist. It is clear that for any function <p(9, |) subject to the condition 


t) 

fix, 6)dx < 1 ~ ^ 

00 


there exists a function {) such that (5) holds. We will choose v>i0, ?) and 
iff id, {) so that (5) is satisfied and 

(6) He, S) - H9, < Ho, 0 - He, 0 

for any value of 0 and for any functions HO, and ^(9, f) which satisfy (5). 

Lot (f = 1, • • • , fc) be the maximum hkeUbood estimate of 9, calculated 
from the observations Xn, • ■ • ,Xpn. We propose the use of the tolerance 
limits 

(7) L = Hi ?) and U = Hi ?) 

where the value of the constant f has to be properly determined. Problem 1 
is solved if we can determine f as a function of 0 and y such that 



Problem 2 is solved if we determine t as a function of )3, X and N such that 


(9) PiM > XiV j 9) = ^ 

where fVjT denotes the number of observation in the second sample which lie 
between the tolerance limits <p(d, f) and iff(i f). The use of tolerance limits 
of the form (7) seems to be well justified by the fact that the functions 1 ^ 0 ( 9 , J) 
and iffie, f) satisfy (5) and (6) and that 9, is an optimum estimate of 9i (■£ = 
I, ■■■ ,k). 

Now we will derive the large sample distribution of 



302 


ABtlAHAM WAIiD 


t) 

0, ^) - I . /(x, 0) dx. 

^ t) 


( 10 ) 

We obviously have 

(11) KB, B, f) = f. 

We will aBsume that the limit joint distribution, of -y/ni^i — di), • • • , 
y/n{h — Bk) JB normal with mean values 0 and non-singular covariance matrix 

II «rf/(®) II = II I|~‘ whore c,,{B) denotes the expected value of — 

opi ddj 

(i, j ~ I, ■ ‘ , fc). This is known to be true if f(x, 6) satisfies some regularity 
conditions.’ Furthermore we assume that ^>(6, £) and ^(5, f) admit continuous 
first partial derivatives with respect to , ■ • • , 6* and that/(x, 0) is a continuous 
function of x in the neighborhood of a: = <p{B, f) and x = f (0, f). We have 


( 12 ) 


dim, 8, 5) 

tSi 


30; 

Assuming that at least one of the derivatives 


a 0] - MH^/[»^(3,e), B] 


is not zero, itfol- 


30, 

aim, BjJ) 

d6{ lf-« 

lows from our Lemma that 

\/n[im, 0, f) — im, B, f)l ” y/nlim, 0 , ?) ~ is in the limit normally distrib¬ 
uted with zero mean and variance 

A>. f) - l/W», 0, »ll' S E ^ -„(«) 

i < ot'i 30; 

(13) - 2/W«, 0, Wfo{», {), 9] E E „,(«) 

+ (/W«,«), 9)1’ E E 

t I Ovi OUf 

For any positive d < 1 denote by Xp the value for which 


- r 




(U) 

Then the probability that 
(16) 

® towards /?. 


0““’ dt = p. 


im, 0, t) > t + X(, 

vn 


converges with n 
Let 

(16) 


Vn 


’ See for instanoe J. L. Doob, "Probability and statistics,” Tratu, Amer, Math. 8oc,, 
October, 1934. 



SETTING OP TOLEBANCE LIMITS 


393 


If <r(d, I) is continuous in 6 and it follows easily from (15) that the probability 
that 

(17) e, I(i3, y, ^)] > y 

holds, converges to p with n —> <». Hence we can summarize our results in the 
following 

Theorem 1: Let <p{d, f) and 4>{6, be two functions satisfying (5) and (6). 
Furthermore, let the functions J(h, 6, |), <r*(9, J) and Kfi, y, be defined by (10), 
(13) and (16) respectively. Denote by di, ■ • • , 0° the true values of the parameters. 

It is assumed that there exist two positive numbers e and S such that the follomng 
three conditions are fulfilled. 

k 

(a) For any point d for which 23 (0* ~ ®>)^ ^ t limit joint distribution of 

■\/n0i — 8i), ‘ ‘ , ■\/n{hi — Ok), calculated under the assumption that 6 is the 
true parameter point, is normal with zero means and a finite non-singular covariance 
matrix |la'» 3 (^)|| where <r,j(6) is a continuous function of 6 in the domain 
t, (0. - ey < e. 

(b) The partial derivatives- ^ conimuoMS/unc- 
tions of 0 and ( in the domain 

2 ( 0 . - 0?)^ < < and I f - 7 I < 5. 

dl(§ 0 ^ y) 

(c) At least one of the partial derivatives -i 90 ^ 

equal to zero. 

Then the probability that 

i[^, e\ W, y, ^)] > T, 

holds, converges to (3 with n~* 

From Theorem 1 we obtain the following 

Large sample solution of Problem 1. For large n we can approximate the 
lower and upper tolerance limits by _ 

S, y, ^)] respectively, where y, 0) is given by (16). 

Now we will deal with Problem 2. We distinguish two cases 


N 

hm — = CO . 
n-^» n 


It is easy to see that in this case the solution of Problem 2 is obtained from that 
of Problem 1 by substituting X for y. Hence for large n the tolerance limits 
can be approximated by ^[0, W, \ ^ and M i(^, respectively. 



394 


ABRAHAM WALD 


Tor these* tolerance limits condition 2 is fulfilled in the limit, i.c. 

lim P(M > X/V I , • • • . 00 = 0 

««•!« 

jV 

(h) The intcRorfi n and N api>roach infinity while remains bounded, 

DenoU* 6, f) “ by u and y/\' ''J'i‘> where denotes 

the number of obhcrvations in tin* second samfrle which fall between the limits 
(fi0, f) and f). l'’«r any fi.’ced value of u the conditional exi)ected value of 

is given by ^ 4- and the conditional variance of is given by 
N •y'n /v 

- ^ 

Hence the conditional expected value of v is 


^ Vn)‘ 


equal to u 


V! 


and the conditional variance of v is ecpial to ( f -f- 






- ~,P] . Sinc' 

Vn/ 


Since the limit distribution of u is nonnal svith zero mean and 


standard deviation (r{9, f) given in (13), we find that the limit bivariate, distribu¬ 
tion of u and v Is given by 

, 3 - 


(18) 


1 


2T<r(0,^)Vfa'- « 


oxp 




L 2<rH0,«) 


2t(l “ f) J 


du dv. 


From (18) it follows that the limit distribution of e is normal with zero mean 
and variance 


(19) 




= ^K1 - ^) + NAe, 0 
n 


From (19) it follows easily that the probability that 

( 20 ) 


Af(t) f. I X(5cr, 


converges to /3 with n oo. Let 

( 21 ) 


eiP, X, d) = X - 


X) + nA^, X) 
n 


From (20) it follows that the probability that 



SETTING OP TOLERANCE LIMITS 


395 


converges *o p with n—^ oo. The letter M denotes the number of observations 
in the second sample which lie between the limits <p[e, X, 0)] and 

X, 0)]. 

We can summarize our results in the following 

Theorem 2. Lei ip(d, and \p{8, f) be two functions satisfying (5) and (6). 

Two samples of size n and N respectively are drawn and the maximum likelihood 

estimate 0 is calculated from the first sample only Assume that conditions (a), 

(b) and (c) of Theorem 1 are satisfied. Let f(j 8 , y, 6) and ^*{0, X, 6) be defined 

by (16) and (21) respectively. 

N M 

If n and ~ both approach infinity, the probability that ^ > X holds, converges 

to 0, where M denotes the number of observations in the second sample which he 
between the limits <p{6, 1(0, \, 6)] and \(fi, X, 0)]. 

N 

If n and N approach infinity while — remains bounded, the probability that 


^ > X holds, converges to 0, where M denotes the number of observations in the 

second sample which lie between the limits <p[6, (*(0, X, ^)] and \l/[9, ^*(0, X, ^)]. 
From Theorem 2 we obtain the following 

Large sample solution of Problem 2. If n and — both approach infinity 

the lower and upper tolerance limits can be approximated by ip[^, 1(0, X, ^)] and 

N 

1(^1 ^)] Tespeclively. If n and N both approach infinity while — remains 

bounded, the tolerance limits can be approximated by tp[d, ^*(0, X, 9)] and 
'p[^} ^*( 0 ) ^)] respectively. The expressions |(/3, X, and ^*( 0 , X, 0) are given 

by (16) and ( 21 ) respectively. 


i. The multivariate case. For any positive f < 1 let <pi(0, f) and \l/,(6, f) 
(i = I, p) he p pairs of functions of 6 such that 

rilUf.O 

( 22 ) / •■• / f(xi, ,Xj,, e)dxi dxj, = ^. 

Jffif.O •’riO.O 

If f(xi, ■ ■ ■ , Xf , d) is a, continuous function of xi, ■ ■ • , Xp , functions (p,(fl, f) 
and ^i(8, 1) (i = 1, ■ ■ , p) satisfying (22) certainly exist. As in the univariate 
case, there will be infinitely many sets of p pairs of functions <pi(6, |) and \pf6, f) 
which satisfy (22). Since we wish to have tolerance limits as narrow as possible, 
we will try to choose the functions <(>,(6, f) and ^,( 0 , f) so that ^,( 0 , f) — ViiO, 0 
should be as small as possible. Since it is impossible to minimize all p differences 
0 . • ■ • , 0 simultaneously, we will have to be 

satisfied with some compromise solution. For example, we could minimize 
the product II [M^, 0 - ^.(6, f)] or some other function of the p differences 
» , , , 

I) _ ^,( 0 ^ f). Another reasonable procedure would be to minimize 



390 


KmmKtA AV.VLD 


II ^) ~ 0] HuhjiTt to (22) ancl tlu* ('ondvtion that for any i and j, 

t'lt' ?? >•'’ t'tniul to tli(' ratio of tliostandard deviation of x, to that of x, . 

yAd, - y,(e, f) ' 

Here wo will deal with the [irohlom of doiivinn toloranco limits for the vai-iatea 

Xi, • ■ ■ , Xp iiftor tlu' functions <^,(0, 0 and \f,(0, f) liavo boon ohoson. Since 

the theory of the multivariate easse is very sindlar to tliat of the univariate 

case, we will merely outline it briefly. 

As tolerance limits for .c, we wall use the functions <^>,{6, $) and ^,(5, f) where 
the %mluc of f has to he properly determined. Problem 1 is solved if we can 
determine, f a.s a function of d and y so that 


1 fNfpd.u 

r^iCLu 


V*>p(e.E) 

' J . /(-Tl, • 

•'*>1(8,E) 

'• ,Xp, 0) dxi dXp >710? 


Problem 2 is solved if we determine ^ as a funetion of 0, X and AT sue.li that con¬ 
dition 2 is fulfillc'd. Let 


(24) 
and let 


me,i) » / ■■■ I /(a-,. 


(25) 


f-vLu 


, Xp , 0) clxi 


dx„ 


W, 0, f, ^,) = / . • • • / . I . 


/•flCt.t) 

/ , f{x\, ,Xp,B)dx\ - (fx,-! dx,,.i • ■ • dXp. 


We have 
(2G) 




= e, t, ^.(0, f)] 

a-l OVi 


Assuming that the partial derivatives 


o<pAe. i) 

c 

sni, e, {) 


- ±'' /.to, 0. (, {)], 

Otfi 


tiauous functions and that 


510, 6, €) 
dS, 


db. 




(i == 1, • • > , k) are con- 




is not zero for at least one value of 


i, it follows from our Lemma that 0, f) ~ 1(9, 6, f)] = Vn [10, 9, f) — f] 

is in the limit normally distributed with moan value zero and variance 

»'(«, W«, 9 , S, Me. one, e, {, Me. 0 Wfl 

g«i (-I jmi x^i atf{ atij 


(27) 


- 2 2 21 Z S 

V » , f 00, 


+ ^ 2 : 2 : 2 : 


eue ,«) 5<pAe^^) 

OBi ' OB, 

'hie, e, Ue, mhlo, e, f, <pAo, £)]<rde) 

d<p.{e, d^J) 

dd{ OBi 

'I,[B, 0, <pA5, $)]7,[0) B, f, 1 / 5 ,(0, f)]tr^,<0) 



SETTING OF TOLERANCE LIMITS 


397 


where || || is the limit covariance matrix of \/n(0i - ^i), • ■ • , 

■sj nidk — 9k)- 

For any positive (3 > 1, let be the real value defined by the equation 
(28) 

Let 

(29) f(^, ^ Y - X^?^^ 

V n 

and 

(30) f*(|3, X, ^) = X - . Ax(l - X) + N»\e, X) _ 

VN V n 


We can easily prove the following two theorems: 

Theorem 3. Let (p,(6, |) and 4't(.6, ?) (^ = 1, • • • , p) be p pairs of functions 
which satisfy (22). Let the functions 1(6, 6, f), d-^(6, f) and f(ff, y, 6) he defined 
hy (24), (27) and (29) respectively. Denote by 9i, • • • ,6° the true values of the 
parameters fli ,•••., is assumed that there exist two positive numbers e and 

5 such that the following three conditions are fulfilled'. 

k 

(a) For any point 6 for which 2 (^» “ < t the limit joint distribution of 

•\/n(6i — 6i), • • • , -^/ni^k — 6k), calculated under the assumption that 6 is the 
true parameter point, is normal with zero means and a finite non-singular covariance 
matrix || oifid) || where cr,j(6) is a continuous function of 6 in the domain 
E (9i - 9Vf < e. 


dl(.h 6 

(b) The partial derivatives -— , 

d&i 

k 

tions of 6 and £ in the domain ^ (9, — otfi < e and j £ —^ 1 < 5. 

1 

Qjfh y) 

(c) At least one of the partial derivatives -— . (i = 1, 

' ' da, »-6* 


(i = 1, • • • , fc) are continuous func- 


, k) IS 


not equal to zero. 

Then the probability that 


I[6, 9°, W, 7, S)] > 7 


holds, converges to (3 with n —> oo. 

Theorem 4. Let v>i(d, £) and ypi(9, £) (t = 1, ■■■ ,p) be p pairs of functions 
which satisfy (22). Two samples of size n and N respectively are drawn and the 
maximum likelihood estimate 6 is calculafed from the first sample only. Assume 
that conditions (a), (b) and (c) of Theorem 3 are fulfilled and let f (/3, y, 6) and 
f*(^, X, 6) be defined by (29) and (30) respectively. Denote by y,a the outcome of 
the oL-th observation on the i-th variate in the second sample. 



WALII 


3‘>8 


V 

If n mid ' hdili npprmrh iiifimln, Ou iindinUilitii dint M > X.Y holda ronvcrgps 
lo /f, when M dmnti'.i tlw numhtr iif dijhrint rntiirn uf a fur u-liirh 

^-.[5, f(/J, X, §)\ < //.„ < no, tUi, X, ^/] (j =. 1, ... , p), 

.V 

If n ami .V apprnnrli infmihj while rimmnn hounded, th< prohahility llini 

Si X.V hnlfh rf>nrerffr.‘< lo d wlun M tlenohx tin nuinluv of dtjftrenl i-itlurx of a 
far whirh 

X. &)] < ?/.„ < no, X, d)] (, ^ 1. ... , p). 


The* ]ir<itifs of ’riu‘f)rt’ms 3 iuitl 4 urr* omit till .since they nre sitnilar to the* 
proof.s of Thcon'tns 1 ami 2 

From Tlu’orom 3 wt* ohtain the following 

LAutiE SAMiTj; KOi.fTioN OK Pttoiu.K.M 1. For larijf It u'r ran npprojcimntr the 
lower and upper lalrrtmrr limtlH for x, bii f(fi, 7, f})] and V/,[0, 7, ^)] 

rrsprr,lively where fhh 7, 0 ) in given hy (2*1). 

From Tlieorern 4 we ohtaiii the following 


V 

Lakoi; kamit.k .sou-tiox of Pti()iu.E.M 2 . If n mid approaeh injinily, the 
lower and upper (alernnee limUxfiir x, can he npproximaled by fW, X, ^)1 and 

Y 

f([^i Hd, X, ^)I rexpecUvel!/. If rt mid X hath approach infinity while - remains 

bounded, the Inlernnee limilH for x, can he approximated by y,ld, X, (5)1 and 
^>[^1 f*(/3i X, d)] respectively. The e.rpreKswns X, d) mid X, d) are defned 
in ( 29 ) ami ( 30 ) respeelively. 


B. An example. Let .r he a normally di.stiihuletl r’uriate with mean \'alLie 0i 
and Htamlarcl deviation O'l, i.e. the proluiliility density funetion of x i.s given by 


fix, di,0i) = e 




X'ZirOi 

For any positive f < 1 let p(^) he the vulue for whieh 


V2 


I r"'** 
''2?r i-p({) 


Ji’ 


dl = 


Then the funedions 
and 


ipiO, ~ 0i ~ /3(f)0s 


i'iO, ^) = 0, -h p{^)0. 


satisfy conditiou.s (5) and (0). 
We have 


g, = ^.L+. 


+ x„ 


n 


= X and Si = 


£ ixc - sY 
n 



SETTING OF TOLERANCE LIMITS 


The variance of \/nik - Si) is equal to $2 and the limit variance of '\/n(di - 62) 
is equal to \S2 . Since the covariance of and ^2 is equal to zero, we obtain 
from ( 13 ) 



T 


Hence for large n the tolerance limits satisfying ( 1 ) can be approximated by 
^1 “ (>&i and h + p( 5 )fe respectively where 

and is the value determined by the equation 




If n and iV are large, the tolerance limits satisfying ( 2 ) can be approximated by 
^1 - and + p(^% respectively where 



STATISTICAL PREDICTION WITH SPECIAL REFERENCE TO THE 
PROBLEM OF TOLERANCE LIMITS* 

Bv S. S. Wilks 
Priruxlon University 

i. Introduction. Statistical methodologj' is becoming recognized in industry 
as an effective tool for dealing with certain problems of inspection and quality 
control in mass production. Quality control experts have, found statistical 
methods useful in detecting excessive variation in a given quality characteristic 
of a product from a scries of observations on the given quality characteristic, 
and in isolating the causes of such variations back in the materials or operations 
involved in manufacturing the product. By a process of Huccesstve detection 
and elimination of causes of variability, a controllrd stale of quality is established, 
A practical statistical procedure for establishing a controlled state of quality 
has been developed by Shewhart.’ More recently, manuals for routine applica¬ 
tion of this procedure have been issued by the American Standards Asso¬ 
ciation.* 

In this paper we do not propose to go into a discu-ssion of the application of 
the well known Shewhart procedure. The reader may refer tt) the literature 
mentioned in footnotes 2 and 3 for such discussion. It is sufficient to remark 
that experience shows that the application of this procedure leads to a con¬ 
trolled state of quality. Such a state of control provides a basis for making 
statistical predictions aliout measurements on the given quality characteristic 
in future production. 

More specifically, suppose a given quality characteristic of a given product is 
measured by a variable X, such that AT has a specific value for each individual 
product-piece. For example, the product may be a given type of fuse and X 
may be the blowing time in seconds. A product-piece would be a single fuse, 
and X would take on a value for each fuse. Thus, for a sequence of n fuses 
taken from the production line, there would be a corresponding sequence of 
values of X, say Xi, Xt, • ■ • X^ . If a state of control has been established 
with respect to blowing time as measured by X, then the sequence of values 
of X will “behave like a random sequence.” By this we mean that the sequence 
will be such that we con safely assume that it can be described mathematically 
by regarding AT as a conlinwua random variable^ i.e., such that there exists some 


‘ An expository paper presented at a joint session of the Ainerioan Mathematical Society 
and the Institute of Mathematical Statistics at Poughkeepsie, September 9 1942. 

• W. A. Shewhart, Control oj Quality of Manufactured Product, D. Van Nostrand Com- 
pony, Now York, 1931. 

* Guide for Quality OoHirol and Control Chart Method of Analynng Data (1041), and 
Control Chart Method of Controlling Quality During Production (1940), Ametioftn Standards 
Association, New York. 


400 



TOLERAIfCE LIMITS 


401 


probabpty function /{x) which describes the distribution of values of X, such 

fa probability that a <. X <. b for any two real numbers 

a and b. Now, suppose we consider a sequence or sample Si of n values of X, 
and let Xi and Xn be the smallest and largest values of X in the sequence. 
The types of questions with which we are concerned are the following! If a 
further sample, say St of N values of X is taken, what is the probability P that 
at least No of the values will lie between Xi and X 2 as determined by Si? If 
we choose a given probability a, at least what proportion of values of X in an 
indefinitely large sample S 2 will fall between Xi and X 2 of Si with probability a? 
What is the probability P' that at least No of the values of S 2 will exceed Xi 
of Si? At least what proportion of values of X in an indefinitely large sample 
Si will exceed Xi with probability a? These questions suggest several of a 
more general nature which can be treated by methods similar to those which 
will be discussed, For example, instead of taking Xi and X„ , i.e. the smallest 
and largest items in Si as tolerance limits we could use X^ and X,i_m+i. More 
generally, we may define 100fi!a% tolerance limits Liixi, Xi, ■ ■ ■ a:„) and 
Li(zi, Xi, • • • , a:„) for probability level a of a sample Si of size n from a popula¬ 
tion with distribution f{x) dx as two functions of the X’s in (Si such that the 
probability is a that at least 100fla% of the X’s of a further indefinitely large 
sample S 2 (i.e. the population) will lie between Li and Lt . Or more briefly 

pQ^' fix) dx >R^ = a. 

The same notion clearly applies if »S 2 is a finite sample of size N, rather than an 
indefinitely large one. In this case we would be interested in the largest integer 

Na such that the probability is at least a that at lea.st 100iZa% («. = 

of the X’s in S 2 would lie between Li and Li . In most practical situations we 
are able to assume nothing more about f(x) than it is a probability density 
function. We make only this assumption here. The only functions of the 
values of X in Si that we shall consider here in setting tolerance limits are order 
statistics, i.e. the ordered values of X, because the results will then be fairly 
simple and independent of fix). 

2. A General Probability Formula. It will be convenient perhaps to derive 
a general probability formula at this stage from which we can derive certain 
special cases as we need them. 

Let Xi, X 2 , ■ ■ ■ , Xn be the n values of X in Si arranged in order of in¬ 
creasing magnitude. Let ri, rj, • • • , rj, be integers such that 1 < rj < ri < 


< rk < n. Let x. 


, Xrt be k real numbers. Let 



402 


s. s. v/ihm 


from wliich 


/(Xri) f/j'r, « dpi , /(jr,,) (lx,, ~ dlh , ■ ' • ./(Jr*) ~ rfjJfc . 

Thon aHKUming X \, Xi, • • • , A'„ to he si nuuUtm Kamplo (ordorf^tl) from a 
population with probaliility clomont Jix) dx it follows from the multinomial 
distribution law'' that the prol)ability of < X'r, < J,, -f- dx,, fi ™ 1, 2, ■ • • , fc) 
is given by 


n! 


( 1 ) 


1! r, — n ~ 1! 


J'k — r*„i — 1! a — r*! 


p[' 'p? ' 


j)k* 'pmT* dpidpi ■ ■■ dpk 


except for terms of order higher than {ripidp^ • > ■ dpi). Ciiven that Xr, = 
a:,, ,*--,Xrk =* x,^ in »S\, the conditional probability that Xi, Xj, • • • , 

Xk+i ( £ ~ X) of the values of X in iS'* will fall in the intervals (— oe, Xrj), 

(^ri , Xf,), ' • • , {Xr^, 00 ) respectively is by tlu^ multinomial law 


( 2 ) 


_X! __ 

x,!x,i:::xk«! 


pf'ps’ ••• 


The joint probability law of Xr. , X,,, • ■ • A',* anrl A") , X*, > • • , Xjn i 
(22 Xj « X] is given by the prcxluct of (1) and (2). Integrating this product 
with rcispect to the a;'H (i.e. the p’s) we find the probability law of the X's to bo 


fq! XinlXt4- n- l iNi+Tj ~ n- 11 • ■ * Xt + r*-~ I!X^t + n- r*l 
■ Vi- llrj~ri-ir:;r*Xr*:,-irn-r*rX + nlX,!X,!... Xfc2;i"‘ 

which is clearly independent of /(x). This result can be derived by direct com¬ 
binatorial methods but the present derivation provides a simple proof tliat the 
result is independent of f(x). 


3. The Problem of One Tolerance Limit. There are problems in quality 
control in which it is important to consider only one tolerance limit. For 
example, in testing breaking strength of steel wire the most significant tolerance 
limit is tlie lower one. The problem of prediction in this ease is as follows: 


* Which states that if a trial results ia ctno and only one of the mutually cxcluaive events 
B I, E i, lEn, tho probability P that in n total of n trials tii will result in Ei , n* in 


Et , n* in tu »»7i^, is given by 


nilnd • n*! 


where p,, p,, p; «. ate tho probabilities of a single trial resulting in JSi, 


JEj, •' • , Ell respectively. 



TOLERANCE LIMITS 


403 


Suppose the given quality characteristic, as measured by Z, is in a state of 
statistical control, and that a sequence of n measurements on X have been 
made. Let Xi be the smallest of the n values. What is the probability that at 
least Na ot N further measurements on X will exceed the value Zi as deter¬ 
mined by the initial sample? Instead of considering the smallest value of X 
as the lower tolerance limit we could just as easily choose the second smallest, 
or any other small order stdtistic but the case of the smallest value is perhaps 
of greater practical interest than any other case. The problem of an upper 
tolerance limit is entirely similar to that of a lower tolerance l imi t 

Table I 

Values of Na and Rot for a. = 0.99 and 0.95/or several combinations of values of X 
and n, and for the 'problem of one tolerance limit. (For N = ts denoted 

by Ra) 


n 

N 

a = 

0.99 

1 a » 

0 95 

Nn 

H M 


H 1)6 

10 

10 

5 

mmm 

7 

.700 

10 

20 

11 


14 

.700 

10 

00 

— 

.631 

— 

.741 


50 

44 


46 

.920 


100 

90 


93 

.930 


OO 

— 

.912 

— 

.942 

100 

100 

94 

.940 

96 

.960 

100 

200 

189 

.945 

193 

.965 

100 

QO 

1 

.955 

— 

.970 

500 

500 

494 

.988 

496 

.992 

500 

1000 

989 

.989 

993 

.993 

500 

OO 

i 

.991 

— 

.994 


The probability Pi (Wo) that No of the N further measurements will exceed the 
smallest value of X in an initially drawn sample of size n is given by (3) for 
k = I, ri = 1, Ni = No , Wi = N — No , i.e. 


(4) 


Pi(Wo) = n 


N\No + n - 1! 
Wol N n\ 


Values of Pi (Wo) can be easily calculated by using the recursion formula 


(5) 


Pi(Wo - 1) = 


No 

No + n 1 


Pi(Wo). 
















404 


S. 8. WILKH 


For given values of A’', n atnl a we are mt<‘resti'(l in the largest integer Na for 
which 

(6) f: FifA^o) > «. 


If we ftet ™ » 72„ and set Liin /?„ = /?„ it can bp \'tTifi«I tljat the value of 
Rg ia given by solving the following eaiuation for 

(7) n f = a. 

■>«» 

It will be observed that n$" df is to within terms of onler df the, probability 
that f < / f{x) di < C + df in samples of size a from a distribution with 

•'j'l 

probability element f{x) dx, where, Xi is tin; smallest value of X in the sample. 
The stati.stipal intcrfiretation of (7) is simply thus; 7Vjc prnhabihty is a that the 
proportion of values of X exceeding A'l tn a further irulefmiicly large sample is 
at least /i« . 

Choosing a « 0.99 and 0.95 Talde I show.s values of Na and R„ for various 
combinations of values of n and N for the case of one tolerance, limit. The 
table indicates the degree of precision with which predictions almut a single 
tolerance limit can be made from a sample of size n about a further sample of 
size for a few important values of n and N. It should be noted that each 
prediction is made concerning a pair of samples, i.e. an initial samiple of size n 
and a further sample of size N and that the prediction holds for any function/(a;). 
Thus os a typical entry we may state that if a sample of 100 is drawn and also 
a sample of 200, then the probability is 0.99 (approx.) tliat the X’b of at least 
189 (or 94.5%) of the cases in the second sample will exceed the smallest X in 
the first sample. 

4. The Problem of Two Tolerance Limits. Again, suppose the given quality 
characteristic as measured by AT is in a state of statistical control and that a 
sequence of n measurements are made on X. Let Xi and Xn bo the smallest 
and largest values of X respectively. The question to be considered now is the 
following: What is the probability that at least No of N further measurements 
on X will lie between the values Xj and X„, as determined by the initial sample? 

Wo proceed by considering the special case of (3) For which A: » 2, n »» 1 
rj =■ n, Xj « JVo, Xj = W — Xo — Wi. We find for the joint distribution 
of Xi and Xo 


X! nl Xo + n — 2! 
n-21XolX 4- nl‘ 


( 8 ) 


F(X.,Xo) 



TOLERANCE LIMITS 


406 


To obtain the distribution of Nti , we simply sum (8) with respect to Ni from 
0 to fV — No, thus obtaining 

(9) PM = n{n - 1){N - iVo + D ■ 

A convenient recursion formula for computation purposes is 


( 10 ) 


P2{N, - 1 ) = 


N,iN -No+ 2) 


PoiNo). 


(N -No + l)(No + n-2) 

For given values of N, n and a we require the largest value oi Na for which 


( 11 ) 


E P2iNo)>a. 


Setting ^ = Ra and Lim = Ra one finds that is given by solving 
the equation® for Ra 

(12) n{n — 1) / ^”*(1 — i) dg = a. 

It can be verified that n(n - l)r~*(l - g) df is to within terms of order df 
the probability that f < f ' f(x)dx < ^ + df, thus showing that (12) is the 

J JCl 

probability that the proportion of an indefinitely large number of further values 
of X lying between Xi and X„ is at least Ra • 

Table II gives, for the case of two tolerance limits, values of Na and Ra for 
several important combinations of n and N, including limiting values Ra of Ra 

for indefinitely large fV. _ - i 

It should be noted that the problem of two tolerance limits can be immediately 
extended to the case where the lower and upper tolerance limits may be any two 
of the order statistics in Si . 


6. The Problem of Tolerance Limits for Two Quality Characteristics. We 

have thus far devoted our discussion to the problem of tolerance limits for a 
single quality characteristic. The problem of two or more quality character¬ 
istics can be treated by methods similar to those already used. The simplest 
case is that in which each product-piece under consideration is measured on two 
independent quality characteristics. Suppose the two characteristics are meas¬ 
ured by X and Y. Let a sample of n product-pieces be taken, assuming a state 
of statistical control has been established, and let Xi be the srnallest of the X 
values and Fi the smallest of the 7 values. The question with which we are 


' This limiting case in the problem of tolerance limits as well as that expressed m (7) 
and other similar limiting cases have been considered by the author m an P P 

"Determination of Sample Sizes for Setting Tolerance Limits,” Annuls of Math. Siat. 

Vol. XII (1941) pp. 91-96. 



406 


8, S. WILKS 


concerned here is the following: If N further product-pieces are measured onX 
and F, what is the probability that X > Xj and F > Fi for A^oof the pieces? 
Let X and F be statistically independent and and g(y) be the probability 

/•Xl mYl 

functions of X and F respectively. Let j f{x) dx ~ p and / g{y) dy = q. 

w—cn 

The probability law of p and q is 

(13) n\l ~ pT-\\ - qT-^dpdq. 


Table II 


Values of Na and Ra for a = .99 and .95 for several combinations of values of N 
and n and for Die problem of two tolerance limits. {For N — <xi ^R„ is denoted 

by Ra) 


n 

N 

a ■» 

0.99 

a « 

0.95 


■R.m 

Nk 

Rh 

10 

10 


.400 

5 

.500 

10 

20 

8 

400 

11 


10 

00 

— 

.490 

— 



50 

42 

.840 

44 

.880 


100 

85 

.850 


.900 

■■ 

00 

■— 

.874 


.909 

100 

100 

89 

.890 

92 


100 

200 

184 

.920 

188 


100 

00 

— 

.935 

— 

.953 

WEM 

500 


.982 

494 

.988 


1000 

985 

.985 

989 

.989 


00 

— 

.987 

— 

.991 


In a further sample of size N the probability that for JVo of the cases, X > Xi 
and F > Yi, Xi and Fi being determined by the first sample, is 

(14) [(1 ~ P)(l ~ 5)ni - (1 ” p)(l - 


The joint probability law of No,p and q is given by the product of (13) and (14). 
Integrating this product with respect to p and g we obtain as the probability 
law of No, 


HoiNo) = n“ 


(N\ (N - No\ {~iy 
\No/ 0 \ i / (n -j- No -|- i)^ 


(15) 


























TOLEEANCE LiMtTS 


407 


For given, values of AT, n and a it is important, as before, to determine Na as 
the largest integer for -which 


£ P!>{No) > ct. 

Na—tTa 

Na 

Setting = Ra and Lim R„ = one finds Ra to be given by solving the 
^—►00 ® 

following equation for 


(17) -n' r ^’-Mog^dg = a 

Ra 

The expression^ — ^ log ^ is simply the probability that f < 

\-/r ^(^^ to with'ji terms of order df, which is the 

proportion of the population pairs (X, F) for which X > Xi and F > Fi. 

In the problem of two tolerance limits for each quality characteristic, as deter¬ 
mined by an initial sample of size n, we calculate the probability that JVij mem¬ 
bers of a further sample of size N will fall within the two sets of tolerance limits, 
with respect to the two characteristics. The problem is similar to that for 
one tolerance limit for each of two quality characteristics. For this case, we 
find corre.sponding to (15), (16), (17), respectively, the following' 


(18) PM ^ n\n ~ 

\No/ (to \ ^ / (iVo 

and 


(- 1 )’ 


+ n--l + i)KN, + n + iy' 


( 19 ) £ PM > « 


(20) n\n - 1)^ r r'm - 1) - (S + 1) log d? = a. 

The derivations of results analogous to (15), (16), (17), (18), (19), (20) for 
tolerance limits defined by other order statistics than least and greatest and 
also for more than two independent® quality characteristics are straightforward. 


6. Further Remarks and Discussion. For a given set of tolerance limits on a 
random variable X as determined by an initial sample of size n, we have dis¬ 
cussed the problem of predicting, with a given degree of probability, at least 
what proportion of values of x in a further sample (finite or .indefinitely large) 
will lie between these tolerance limits. We have obtained theoretical results 

“ In a paper to appear in a forthcoming issue of the Annals of Math Stal., A. -Wald has 
shown how to set up tolerance limits for the case of two or more statistically dependent 
variables 



408 


8. 8. WILKS 


which depend only on the assumption that X is a continuous random variable 
with some probability element /(x) dx, where fix) is not assumed known, 

Jt should be emphasized that the concept of a random variable is very broad 
in the sense that X may be a random variable detennined as a result of calcula¬ 
tions on other random variables. For example, X may be the difference, 
product, or ratio of two random variables, or the average or any other “reason¬ 
able” function of several random varialdes which may be of interest in any given 
situation. Thus, on the basis of an initial sample of tlifference.s of two random 
variables, we may set up tolerance limits of dififcrcnces and make predictions, 
for a given probability level as to how many dUferences in a further sample 
of difference will lie between thee tolerance limits. Similarly for products, 
ratios, and other functions of random variable. 

From the point of view of practical application, we should again note that the 
mathematical assumption that X is a random variable mcan.s that a state of 
statistical control as described in §1 must exist in the measurements to which 
the tolerance limit prediction theory is to be applied. In. practice X is often a 
discrete variable, i.e. one which can take on only certain isolated values. For 
example, if X is the number of defective product-pieces in a drawing of one 
product-piece, X is either 0 or 1, depending on whether the piece was non¬ 
defective or defective. Our theory would not be applicable to such a case. 
However, it we take as a new variable the average value of X for sevoral product- 
pieces, we then obtain a variable that is continuous enough for the tolerance 
limit theory to be applicable for all practical purposes. 

Finally, we remark that although we have used, as concrete examples, situa¬ 
tions in mass production engineering, the notions of tolerance limits and predic¬ 
tions within tolerance limits which have been discussed apply equally well to 
situations in any branch of applied science where measurements are made and 
used as a basis for predictions concerning future measurements. 

7. Summary. After a state of statistical control has been established with 
respect to a quality characteristic of product-pieces in mass production by the 
standard statistical quality control methods developed and refined by Shewhart 
and others, there remains the problem of determining the accuracy of predic¬ 
tions as to how many future product-pieces will fall within tolerance limits 
specified by measurements on product-pieces already produced under the given 
state of control. This problem and some of its extensions are discussed in the 
present paper. 

More specifically, suppose an initial sample of n product-pieces, manufactured 
under a given state of statistical control, are measured with respect to a given 
quality characteristic. Let X be a variable which measures the given charac¬ 
teristic, so that X has a definite value for each product-piece. Let Xi be the 
smallest and Xn the largest value of X which occurs in the initial sample. Now 
consider a further sample of size N. The following problems of prediction re¬ 
lating to the second sample from information yielded by the initial sample are 



mSRANCE LIMITS 


considered’ (1) Wliat is the probability that at least ffi values of I in the second 
sample wiB exceed the hkm ImMi set by the first samplel (2) What is the 
probability that at least Hi values of I in the second sample iviB lie between the 
two tokm tals Ii and I, set by the first sample? (3) For given values 
of n and I and a (e.g, .59 or ,95), what is the largest integer If „ such that the 
probability is at least a that Jlo i HJ (4) What is the limiting value of 

^ as If increases indefinitely? Tables of values of If. and E, are given 
if 

for each of the two problems (1) and (2), for several important combinations of 
values of n and If and for a = 99 and ,95, 

Problems similar to (1), (2) and (3) are discussed for the case in which tole^ 
ance limits are placed on two or more quality characteristics simultaneously. 
The generality of the theory of tolerance limits and how it applies to differ¬ 
ences, products and ratios and other functions of two or more random variables 
are briefly discussed. 



GENERALIZED POISSON DISTRIBUTION 

By P. E. SatterthWAITE 
Ae.lm Life Inmirance Company 

1. Introduction. The Poisson distribution is one of the most fundamental 
of .statistical distributions. It is the distribution law for the number of events 
if the probability of an event happening in any infinitesimal unit of time is inde¬ 
pendent of the probability of its happening in any other unit of time. Fre¬ 
quently when we analyze statistics which obey the Poi.saon law it is desirable to 
give varying weights to the different events instead of considering them all of 
equal value. Such is the ca.se in analyzing insurance statistics where the events 
are the claims received by the office and the weights are the cost of the claim 
to the company. We shall now show how the Poi.s.son distribution can be 
generalized so as to be adequate for such an analy.sis. 

2. First development. Let/(x, a) be the distribution function of the weights 
assigned to the events where the variable, x, refers to the weight and the vari¬ 
able, a, refers to time. The characteristic function of /(x, a) is 


a) = j a) dx. 


Also let p{a) da he the probability that an event will occur in the infinitesimal 
unit of time, a to a -|- d«. U y represents the sum of the weights, the distri¬ 
bution function of y for this unit of time is 

Fjaiy, a) == 1 - pia) da, y ~ 0 
= fiy, “)p(«) da, y > 0. 

The characteristic function of this distribution is 


'i'da(f) a) = - p(“) da) -f p{a) da j a) dy 


( 2 ) 


= 1 — p{a) da(l — <^(<, a)) 

_ g~p(n)da(l-f((,o)) 


In forming equations (I) and (2) we ignore infinitesimals of orders higher than 
the first in the da. 

The expected number of events in the period of time from ai to ai is 

P = I p{a) da, 

Jai 

and the mean distribution of weights dunng the same period of time is 


fix) = J [p(a)/P]f(x, a) da. 


410 



POISSON DISTRIBUTION 


411 


The characteristic function of this mean distribution of weights is 

I 

= j dx 


= J [p(ci)/P]<f>(i, a) da. 

These equations are based on the assumption that the probability of an event 
occurring in any unit of time is independent of the probability of its occurrence 
in any other unit of time and also the assumption that the weights assigned to 
each event are independent. These assumptions are implied in all that follows. 

Since the characteristic function of the sum of independent variables is equal 
to the product of the respective characteristic functions, the characteristic func¬ 
tion of the sum of the weights during the period of time, ai to a^, is 

$(«) = a) 

(3) = g—/pC<i)<la+/3){<ii)*U,a)da 


Applying the Fourier transformation, the distribution function of the sum of 
the weights is 





dt. 


Equation (3) gives a convenient method for defining a generalized Poisson 
distribution. Any distribution which has a characteristic function m the form 
of $(i) where is the characteristic function of an arbitrary distribution will 
have all the properties of a generalized Poisson distribution 


3. Second development. If we let represent the characteristic function 
of an arbitrary distribution, the characteristic function of the sum of n inde¬ 
pendent items obeying such a distribution law is 4>„(t) = If instead of 

considering n to be a fixed quantity we assume that it is an independent sta¬ 
tistical variable obeying the Poisson distribution law with mean P, the charac¬ 
teristic function of the sum, y, of the items of the sample becomes 

m = s„-,p’’[.^>wre-'’ 

nl 

cJ • 

Therefore y is seen to obey the generalized Poisson distribution law. 


4. Properties. The generalized Poisson distribution preserves the unique and 
very important property of the Poisson distribution that nowhere in its develop¬ 
ment is it necessary to make any assumptions regarding homogeneity. The 



412 


F. B. SATTERTHWAITE 


only requirement is that the occurrence of and weight assigned to any event 
shall be independent of the occurrence of or weight assigned to any other event. 

The distribution of the sum of the weights is a function of the expected number 
of events, P, and of the mean distribution of W'cights, fix), alone. It is inde¬ 
pendent of the way in which P and fix) are made up. Thus, if we are studying 
the distribution of the sum of the weights over a period of a year and if P and 
fix) vary with the seasons, the distribution of y i.s no different than it would be 
if P and/(a:) were constant. It is only necessarj' that the fix)’s for the different 
seasons be weighted in proportion to the expected number of events in deter¬ 
mining the mean/(a:). 

Note also that in the first development it is not necessary that the variable, a, 
refer to time. It could just as well refer to different classes of events dis¬ 
tinguished on any other basis. Therefore, heterogeneous material may be com¬ 
bined in an analysis if it is possible to determine the appropriate mean distri¬ 
bution of weights. 

For a given weight distribution the generaliztxl Poisson distribution for an 
expected number of events, nP, is identical with the distribution of the sum of n 
independent items each of which obeys a generalized Poisson distribution with 
P expected events. 

Because of the property described in the preceding paragraph it is immediately 
apparent that a generalized Poisson distribution obeys the law of large numbers. 
As the number of expected events increases the distribution ap[)roaches the 
normal distribution. 

6. Moments. The moments of a generalized Poisson distribution are func¬ 
tions of the moments of the underlying weight distribution. By differentiating 
the characteristic function we obtain the following formulas in winch the pre¬ 
subscript, 0 , refers to the moments of the weight distribution, fix): 

Ml =» PoMt “ m 

Ml = PoMi = V* 

lit = Pop* 

m = PqM* + 3(PoMa)*. 

The above formulas may be verified through general reasoning by considering 
the moments of the distribution, Fda(y, a) (see equation (1)). This distribu¬ 
tion refers to an infinitesimal unit of time and all the moments about zero arc 
infinitesimals of the first order. In posing from the moments about zero to 
the moments about the mean the corrections are all infinitesimals of at least the 
second order, Therefore, the corrections may be ignored and the moments 
about the mean may be considered to be equal to those about zero. The above 
formulas follow if we take a sample of size P/pda from this population. 

In order to obtain Pearson’s moment functions for a generalized Poisson 
distribution for any given mean value it is convenient to calculate the following 
parameters of the weight distribution: 



POISSON DISTHIBTJTION 


413 


(4) 


om = o/ii 
off = Ofi-i/offl 
0^1 = (0/13/0W1) Voff * 
q( 02 — 3) = (o/i4/om)/off\ 

The Pearson moment functions then take the convenient forms: 

cr^/m^ = o<r*/m 

(5) /3i = o^i/m 

(^2 — 3) = 0(^2 — 3)/m. 

6. Further generalizations. Often the expected number of events is not 
known but can be estimated to a greater or less degree of accuracy. In such a 
case it is convenient to assume that P is a statistical variable distributed about 
some expected value, say P'. A Type III distribution, 

vb 

■^h—\ 

e 


q(P) = — (LY 
’ T{b) \P7 


will generally be as satisfactory as any to assume for P. The parameter, h, 
can be chosen to give any desired standard deviation. The characteristic func¬ 
tion of the distribution of the sum of the weights under these conditions becomes 


4- 


/(O = j 

r( i - <^(0) p 


= 1 + 


The second development suggests another generalization. Instead of assum¬ 
ing that the number of events, n, is distributed in accord with the Poisson 
distribution, we may assume any discrete, non-negative distribution, h{n). 
The distribution function for the sum of the weights is then 

F'iy) = Xh{n)f{y, n) 

where f{y, n) is the distribution function for the sum of n independent weights. 
The variance, of this distribution is given by the formula. 


nff 


1 off 

\ 2 } 


m‘ nTn‘ 

where m refers to the mean, n refers to the distribution h(n), and 0 refers to the 
weight distribution. Some writers have assumed that statistics of this type are 
distributed as a product. Such an assumption is incorrect and causes an over¬ 
statement of the variance to the amount of „m-orre*'nff°'off*. 


7. Application. In Table I is shown the distribution of claims under a cer¬ 
tain plan of group sickness and accident insurance. The parameters, (4), for 
this distribution are 



414 


F. E. SA.TTEIITHWAITE 


(G) otn = 3.G2, tttf* = 8.1, t>fii = 14, oCiSj - 3) = 15. 

This cHstrihution is in tenn.s of weelcs per claim. The insurance company is 
interested in the, financial cost per claim. A .study .shr)-\v.s that the; drstribution 
of the rate of weekly indemnity to which different classes of employees are 
entitled has the average parameters, 

(7) im = 15.25, Iff* « 1 ( 5 . 5 . )j3i = 20, i(/h ~ 3) = 25. 


Since the moment about zero of the product of independent statistics is equal 
to the product of the momente, it is permissible to multiply together the corre- 

TABLE I 


Nearest Duration of Claim in Weeks 

Number of Claims per Year 
per 10,000 Employees 

0 

197 

1 

418 

2 

173 

3 

109 

4 

84 

5 

58 

6 

45 

7 

36 

8 

27 

9 

24 

10 

20 

11 

17 

12 

14 

13 

128 


spending parameters of (6) and (7) to obtain the average parameters for the 
distribution of the financial cost per claim. These are 

m = 55.2, iff* = 134, 2^1 = 280, if/Jj - 3) = 375. 

In Order to study the distribution of cost under a group of policies for each of 
which $180 in claims is expected, we apply eciuations (5) to obtain the par 
rameters, 

(8) ff*/m* » .74, |9i = 1,0, /Si - 3 « 2.1. 

Since the expected number of claims is 

P - 180/66.2 = 3.3 

the probability that there will not be any claims under a policy is 

hCO) = iC3.3)®e“’* = .037. 





POISSON DISTRIBUTION 


416 


Adjusting the parameters, (8), to remove the zero claims and choosing the scale 
BO as to express the results as loss ratios gives the parameters, 

m = 61.6%, a = 52.8%, = 1.57, ^2 = 4.90. 

A Pearson Type I curve fitted to these parameters intersects the axis well below 
the zero point. Therefore ^2 was reduced to 4 59 which gives the expected 
distribution shown in Table II. 

Table II also shows the actual distribution of loss ratios experienced by one 
of the larger group insurance carriers under policies in this class. The Chi- 

TABLE II 


Experience under Group Sickness and Accident Insurance Policies 


Ratio of Losses to Premiums 

Number of Policies 

Expected 

Actual 

0 

18 

11 

.01- .09 

47 

37 

.10- .19 

53 

45 

.20- .29 


56 

.30- .39 

45 

38 

.40- .49 

41 

47 

.50- .59 

36 

39 

.60- .69 

32 

41 

.70- .79 

28 

37 

.80- .89 

24 

20 

.90- .99 

21 

29 

1.00-1.19 

32 

30 

1.20-1.39 

23 

22 

1.40-1.59 

17 

22 

1.60-1.99 

19 

14 

2.00 and over 

11 

9 


square test for goodness of fit gives, 

= 23, 14 degrees of freedom, 

which corresponds to a probability of 5 per cent. Thus it is apparent that 
theory and experience are in fair agreement considering that no allowance was 
made for the lack of homogeneity “between policies.” (This should not be 
confused with the homogeneity “within policies" covered in the theory.) 

If the expected number of events is small, especially if the weight distnbution 
is irregular or discrete, it is sometimes advisable to use the following method. 

1. Use summation or approximate integration to obtain the distribution, 
/(y, n), of the sum of n independent weights for n = 1, 2, 3, and 4. The 
formula is 




410 


F. E. SATTfJKTHWAITE 


f{y, 71 + 1) « [ - X, 7l) (U. 

Jo 

2. ] )(‘tt'rmin(’ the goncrulizc>(l I’oisMtn fllstrilmtion ftir l\ tlu; expected numlier 
of (>v(‘nls, (‘(luul to .some Miiall number, .s-iiy J, The formula is 

Fiu, P) - r PV-7(w. n). 



Fm. 1 SuPEiral Fee Insurance, —, Distribution,/(i/, n), oC the sum ot n indepondent 

clnims. -Diatribution, Fin, P), of the sum of the clniros when P olaims nre expected. 

The average claim ia S60, 

Example If the expected claims under a policy are $100 (P !«• 3) and if the actual claims 
are S4SK), the probability of an experienco as bad as this occurring because of chance factors 
is 0 1%. 


3. Use summation or approximate integration to obtain F{y, P) for P 
1, 2, 4, • • ■ by the formula 

F{y, 2P) = r F(x, P)F(y ~ x, P) dx. 

Jo 

4. If the calculations are carried on from both tails and if the results are 
plotted on probability graph paper, it is often possible to fill in the central sec- 







POISSON DISTHIBUTION 


417 


tioiis by interpolation. Such interpolations should be adjusted to reproduce the 
correct mean This method is illustrated in fig. 1 in the case of surgical fee 
insurance. 

8. Summary. In this paper the Poisson distribution is generalized to allow 
for the assignment of varying weights to events when the number of events 
follows the Poisson law. The ability of the Poisson distribution to handle 
heterogeneous data is preserved in the generalization. An example is given 
showing that the distribution of certain insurance statistics agrees with that pre¬ 
dicted by the theory. 



THE CONSTRUCTION OF ORTHOGONAL LATIN SQUARES^ 

By Heney B. Mann® 

Columbia Universily 

A Latin sciuaro is an arrangement of ni variahlas ii, .r^, • * • , aim into m rowa 
anti m rolnmna such that no row and no column contains any of the variables 
twice. Two Latin aquarca are calltKl orthogonal if when one is superimpo.sed 
uimu the other every ordered pair of varialik-s occurs once in the resulting 
sciuarc. 

The rows of a Latin square are permutations of the row Xi ,Xi, ■ ■ , Xn . Let 
P, be the permutation which transforms Xi, • ,x„ into the fth row of the 

Latin stiuare. Then P,P7* leaves no varialile unchangtsl for i j. For other¬ 
wise one column would conUun a variabh' twice. On the other hand each set of 
m tiermutafions Pi, Ps, • • • , P* such that P.P',"' leaves no variable unchanged 
generates a Latin square. We may therefore identify every Latin square with 
a set of m pennutations {Pi, Ps, ■ • • , Pr,) such that P,PJ^ leavas no variable 
unchangwl. 

Now let (Pi, Pi, •• • , P„), , Qm) be a pair of orthogonal 

Latin squares. We shall show tliat (Pr*Qi, Pj‘Qa, ■ ‘ , PZ'^Qm) is a Latin 
square. Pr'^i in the transformation which transforms the fth row of 
(Pi, Pa, • • • , Pm) into the fth row of (Qi, Qi, ■ • • , Qm). Since every pair of 
variables occurs exactly once if the second siiuare is imposts! upon the first, 
the square (PL^Qi, Pi'^Qi i ■ * • i Pm'Om) contoins for evorj^ i and h a permuta¬ 
tion which transforms ar, into Xk . But then it can not contain two permuta¬ 
tions w’hich transform xt into x*. This argument can he reversed and it follows 
that (Pi, Pa, • • • , Pm) and (Qi, Qi, - - ■ , Qn) are orthogonal if and only if 
(Pr^Qi, Ps^Qi , ■ • ‘ , Pm'Qm) is a Latin square. 

Denote now by an m sided square S any set of m permutations 
(iSi, iS'a, • ■ • , and by the product of two squares S and S the square 
(SiSi, 1 S 21 S 1 , ■ ■ ■ , filmS'm). Then we can state; Two Latin squares Li and Li 
are orthogonal if and only if there exists a Latin square La such that 

(1) LiLa — Li, 

Now let Li, La, ■ • • , Lr be a set of r mutually orthogonal Latin squares. 
Then we must have LJj,k = L*. whore L,* is a Latin square if t fc, Hence we 
have the theorem 

Theorem 1; The Latin squares Lt, Li, • • ^ , Lr are orthogonal if and only 
if there exist r{r — 1) Latin squares Lik{i ^ k) such that LiL,k = Lt. 

Corollary; If L\ L* ond L‘~* are Latin squares then L^ is orthogonal to L*. 

For instance if L and L^ are Latin squares then L is orthogonal to L’. 

‘ PteBonted to the Mathematioal Society October Blst, 1942, After I submitted this 
paper for publication Dr. Edward Fleishor sent me his thesis on Eulcrian squares which 
he submitted in 1934 and in which he proved Theorem 3 in a different manner. 

* Eesearch under a grant in aid of the Carnegie Corporation of New York. 

418 



ORTHOGONAL LATIN SQUARES 


419 


If A — (Ai, As, • ■ ■ , Am) and P is any permutation then we put PA = 
{P-^'i. j PAs, ■ • ■ , PAm) and AP = (AiP, AsP, • ■ , AmP). If A is a Latin 
square then also AP and PA are Latin squares. If A is orthogonal to B then 
AP is jirthogonal to BQ for any permutations P and Q For if AC = P then 
■AP(P CQ) = BQ, since the associative law holds for the operations indicated. 
This means that A and B remain orthogonal if we permute the variables in 
both .squares in any arbitrary way. 

Hence if A is orthogonal to B also AAr^ is orthogonal to We can 

therefore, while preserving orthogonality, always transform the pair A and B 
so that Ai = Pi = 1 where 1 denotes the identity. We shall then say that 
the pair A, P is written in the reduced form. 

Definition 1. 7/ A ts orthogonal to B, and if in the reduced form the permuta¬ 
tions of A are the same as those of B in a different order, and if these permutations 
form a group G, then the pair A and B is said to be based on the group G. 

A pair of orthogonal Latin squares is called a Graeco-Latin square. The 
Graeco-Latin squares constructed by Bose [1] Stevens [2] and Fisher and Yates 
[3] are all based on groups. There exist Graeco-Latm squares, however, which 
are not based on a group. 

If the orthogonal pair A, P is based on a group G and if AC = P then also C 
contains only permutations of G, and since C is a Latin square it must contain 
all the permutations of G. Calling C, the image of A, we obtain a biunique 
mapping P of C into itself. Let Af = C, then P, = A,Af and S has therefore 
the property that every element of G is of the form XX^ where X is in 0. 

Definition 2: A biunique mapping S of a group G into itself will be called 
a complete mapping if every element of G can be represented in the form XX^ where 
X is an element of G and X^ the image of X under the mapping S, 

If an abstract group G of order m admits a complete mapping S then we can 
immediately construct an m sided Graeco-Latin square based on G. To do this 
wc represent G as a regular permutation group. Let Pi, Pi , ■ ■ , Pm be the 
permutations of this representation. Then A = {Pi, Pi, • • • , Pm), C = 
(Pi, P? , ■ • ■ , Pi) and P = (PiP? , PiP'l , • • ■ , PmPm) are Latin squares and 
hence A is orthogonal to P and APT^ and P(PiPf)"^ form a reduced pair. 

If Li, Li, ■ • • , Lr are orthogonal Latin squares and L,L« = Lk then we 
form the product 

(2) LlLliLis • ■ • Lr-lr ■ 

From L,L,jt = Lit , LiLkj = Lj we find LJLtkLkj = Lj and hence L,kLki = L ,,. 
L,k is therefore orthogonal to L.-y. The product (2) has the property that for 
any s ^ r the product of s successive factors is a Latin square On the other 
hand if a product of r Latin squares la , Ln , ’ • ■ , Lt-it has this property then 
the Latin squares Li, Li, ■ ■ , Lr where L, = LiZ^Las ■ ■ • Li— 1 < are orthogonal. 

Definition 3: A set of r orthogonal Latin squares will be called based on a 
group G if every pair in the set is based on G. 

li Li, Li, • • • , Lr are based on a group G then G must admit r mappings 
Pi = Pj, ... , ,Sr into itself such that every element of G can be written in 



■120 


HENRY B. MANN 


the form A"’’ *' • for every i and h with 1 ^ i ^ r and 0 ^ ^ r — i, 

where = A’A'''', and A'' is the image of A under the mapping S. 

Definition 4; 'fhe 7nappin{js .S'l = 1, i'?j, • • • , AV of a group G into itself 
■will br called r-foki complete, if every clement of G is of the form 
for every j and h with I ^ i ^ r and Q ^ h ^ r — i. 

Now' let G bo an abstraet group of onler m aihnitting an r-fold complete set 
of mappings tS'i = 1, )S'j, ■ • • , .S', . Put 

^ Q-’I-ISS r .. tS{ V ..+.1, ^ , pH, (- 


■where 1, Pa, ’ ■ • , P* is a regular representation of G. Then L\, Ln., •' • , Lr 
is a set of r orthogonal Latin s(iuare„s baaed on G. Put A< = then 

LiA7*, • • • , LrA~^ are written in the reduced form. Hence we have 
Theorem 2: A set of r orthogonal Latin squares based on a groiip G exists if 
and only if 0 admits mi r-fold complete set of mappings. 

If G is of order m = 4n. + 2 = 2m' then G has a -self-conjugate subgroup 
H of order m', Suppose G axlmits a complete mapping .S’. We have 


G = // + HA. 

XX^ C H if either X and X" or neither of them arc in //. Further XX'^ C HA 
if either X or A® but not both of them are in //. 

Let a be the number of elements X CZH such that A’’* C //, 
b the number of elements X Cl H such that A® C HA, 
c the number of elements A C HA such that A* C H, 
then a ■+■ b m', a + c - m'. Of the products XX” exactly h -f e are in HA. 
Hence b -f c = m', a = h and therefore m' — 2a, which is impossible since m' 
is odd. We have therefore: 

Theorem 3: Wo 4n -f- 2 — sided Graeco-Lalin square based on a group can 
exist. 

If a group 0 admits r automorphisms Ti = 1, Ta ,•••, Tr such that A*"' 
A*"^ for ^ j and A 1 then the mappings <Si = 1, iS.- == X”’’'“'A’’‘ for i = 
2, 3, • • • , r are r-fold complete; for if 


^ +Sf + I+...+JSi+A 


we have for i = 1 
and for i > 1 

and therefore 




T^ri-i^r<+n y—rf-iyfj+k 

(FA-*)’"'-' « (FX-y*** 


and hence F = A in both cases since by hypothesis A^‘ ^ A’"' for i j and 
A 1, A*'’*"' therefore takes m different values and reproduces every 
element of 0. 

If we represent G as a regular permutation group then the squares Li = 
(1, Pa, . • ■ , P„), La = (1, , Pl^), • • • . L. = (1, PJ^ ■ ■ ■ , p:D are 

orthogonal Latin squares by Theorems 1 and 2. There exist however complete 



ORTHOGONAL LATIN SQUARES 


421 


mappings which are not derivable from automorphisms. For instance every 
group of odd ordef admits the complete mapping = A but A'^ = A^ is not 
an automorphism if the group is not abelian 

Most of the sets of orthogonal Latin squares that have been constructed 
so far are based on abelian groups of type (p, p, ■ ■ • , p) and the mappings of 
the squares of the sets into each other are automorphisms of this group R. C. 
Bose [1] and W, L, Stevens [2] for instance use the cyclic group of automorphisms 
of the additive group of a G. F. (p") induced through multiplication by the 
elements of the Galois field that are different from 0 In this way they assure 
that different automorphisms will map the same element into different elements. 
They give a convenient method for finding a base element of the group of auto¬ 
morphisms. In this w'ay they reduce considerably the labor involved in the 
construction of p’‘ — 1 orthogonal Latin squares of side p". The 9x9 squares 
in the statistical tables by Fisher and Yates [3] are also based on the abelian 
group of type (3,3) but another set of automorphisms is used. 

If m = pr p'i ■ ■ ■ p‘n (pt prime p, r for i 7 ^ k) and if r = min p(* — 1 
then a set of r orthogonal Latin squares can always be constructed from the 
abelian group of type (pi • ■ • pi, ps • • pa, • • ■ , Pn , • • • , Pn) and its auto¬ 
morphisms. This can be done by finding r automorphisms T['\ Ti'\ ■ , T/ 
for each of the subgroups of order pi' such that ’ leaves no element un¬ 

changed except 1. If we apply the automorphisms r,'^\ Tf’, , 3’,'"’ simul¬ 
taneously, for j = 1, 2, ■ • •, r, we obtain r automorphisms of the desired type. 

Once the automorphisms are known the construction of the set of orthogonal 
Latin squares can easily be carried out. To do this we have to write down the 
multiplication table of the group and obtain the orthogonal squares by inter¬ 
changing the rows in accord with the automorphisms 

Definition 5: A set of orthogonal LaUn squares derived from a group and its 
automorphisms will be called constructed by the automorphism method. 

We now prove: 

Theorem 4. Let Cj be the number of classes of elements of order q of a group G. 
Let s = min c, ; then not more than s orthogonal Latin squares can be constructed 
from G by the automorphism method. 

Proof: Let T be an automorphism which leaves no element unchanged 
except 1. If .A is of order q then is also of order q If A = P AP then 
there exists an element Q such that P = Q because, as we have shown, every 
element can be represented in the form X~ X^. But then 

{QAQ'-y = QPP-^APP-^Q = QAQ-\ 

Hence A = 1. T can therefore not transform any element except 1 into an 
element of the same class. Hence not more than s = min Cq automorphisms, 
Ti, • ■ • , T, can exist such that j leave no element except 1 fixed and this 
proves our theorem 

Corollary. Ifm = pVpV- ■ • ■ Pn (p. prime p, 9 ^ pifor j 9 ^ k) then not more 
than r = min pi' — 1 orthogonal m-sided Latin squares can be constructed from 
any group with the automorphism method. 



422 


HENRY B. MANN 


Proof: The Sylow group of order contains a representative of every class 
of elements of order p, hence min c, < min — 1, 

Below are given two examples of Graeco-Latin squares obtained from com¬ 
plete mappings which are not obtained from automorphisms. Neither could 
have been obtained by combining Graeco-Latin .sciuares constructed by the 
method of Bose [1] and Btc'vcns [2]. 

The first example is based on the abelian group of type (2,2,3). If the basis 
elements are defined by 7^ = if* = Q’ = 1 the complete mapping used is given by 

Li = (1, B, R, PR, Q, PQ, RQ, PRQ, Q\ PQ\ RQ\ PRQ^) 

= (1, RQ, PRQ\ PQ\ Q, RQ\ PR, P, Q\ R, PRQ, PQ) 

U = (1, PRQ, P<^, RQ\ Q\ PR. PQ, RQ, Q, PRQ\ P, R). 

The second square is baaed on the regular representation of the dr the alter¬ 
nating group in 4 variables. The generating relations are P* = f?* = Q* = 1, 
QP = RQ, QR = PRQ. The complete mapping is given by 

Li = (1, P, Pif, Q, PQ, RQ, PRQ, Q\ PQ\ RQ\ PPQ*) 

L» = (1, P. PR, P, Q, PQ, RQ, PRQ, PQ\ RQ\ PRQ^) 

U = (1, PR, P, R, Q\ PRQ\ PQ\ RQ\ Q, RQ, PRQ, PQ). 


EXAMPLE I 


1.1 

2,2 

3,3 

4,4 

6,5 

6,6 

7,7 

8,8 

9,9 

10,10 

11,11 

12,12 

2,8 

1,7 

4,0 

3,6 

6,12 

6,11 

8,10 

7,9 

10,4 

9,3 

12,2 

11,1 

3,10 

4,0 

1,12 

2,11 

7,2 

8.1 

6,4 

6,3 

11,6 

12,6 

9,8 

10,7 

4,11 

3,12 

2,0 

1,10 

8,3 

7,4 

6,1 

5,2 

12,7 

11,8 

10,6 

9,6 

5,9 

6,10 

7,11 

8,12 

9,1 

10,2 

11,3 

12,4 

1,5 

2,6 

3,7 

4,8 

6.4 

6,3 

8,2 

7,1 

10,8 

9,7 

12,6 

11,6 

2,12 

1,11 

4,10 

3,0 

7,6 

8,6 

6,8 

0,7 

11,10 

12,9 

9,12 

10,11 

3,2 

4,1 

1.4 

2,3 

8,7 

7,8 

6,6 

6.6 

12,11 

11,12 

10,9 

9,10 

4,3 

3,4 

2,1 

1,2 

9,6 

10,6 

11,7 

12,8 

1,9 

2,10 

3,11 

4,12 

5,1 

6,2 

7,3 

8,4 

10,12 

0,11 

12,10 

11,9 

2,4 

1,3 

4,2 

3,1 

6.8 

5,7 

8,6 

7,5 

11,2 

12,1 

9,4 

10,3 

3,6 

4,6 

1,8 

2,7 

7,10 

8,9 

5,12 

6,11 

12,3 

11,4 

10,1 

9,2 

4,7 

3,8 

2,6 

1,6 

8,11 

7,12 

6,9 

6,10 


EXAMPLE 2 


1,1 

2,2 

3,3 

4,4 

6,6 

0,0 

7,7 

8,8 

9.9 

10,10 

11,11 

12,12 

2,4 

1,3 

4,2 

3,1 

6,8 

6,7 

8.6 

7.6 

10,12 

9,11 

12,10 

11,9 

3,2 

4,1 

1,4 

2,3 

7,6 

8,6 

6,8 

6,7 

11,10 

12,9 

9,12 

10,11 

4.3 

3,4 

2,1 

1,2 

8,7 

7,8 

6,6 

5,6 

12,11 

11,12 

10,9 

0,10 

6,9 

7,12 

8,10 

6,11 

0,1 

11,4 

12,2 

10,3 

1,6 

3,8 

4,0 

2.7 

6,12 

8,9 

7,11 

6,10 

10,4 

12,1 

11,3 

9,2 

2,8 

4,6 

3,7 

1,6 

7,10 

6,11 

6,9 

8,12 

11,2 

9,3 

10,1 

12,4 

3,6 

1,7 

2,5 

4,8 

8,11 

6,10 

5,12 

7,9 

12,3 

10,2 

9,4 

11.X 

4,7 

2,6 

1,8 

3.5 

0,6 

12,7 

10,8 

11,6 

1,0 

4,11 

2,12 

3,10 

6,1 

8,3 

6,4 

7,2 

10,7 

11,6 

9,6 

12,8 

2,11 

3,9 

1,10 

4,12 

6,3 

7,1 

6,2 

8,4 

11,8 

10,6 

12,6 

9,7 

3,12 

2,10 

4,9 

1,11 

7,4 

6,2 

8,1 

5,3 

12,6 

9,8 

11,7 

10,6 

4,10 

1,12 

3,11 

2,0 

8,2 

5,4 

7,3 

6,1 



ORTHOGONAL LATIN SQUARE8 


423 


EEFERENCES 

[1] R. C. Bose, “On the application of the properties of Galois fields to the problem of 

construction of Hyper-Graeco-Latin-squares," Sankhya, 1938 

[2] W L. Stevens, "The completely orthogonalized Latm-square,” Annals of Eugenics, 

1939. 

[3] R. A. FiBitBR and F Yates, Sialislical Tables for Agricultural, Biological, and Medical 

Research, Edinburgh, Oliver and Boyd 



A METHOD OF DETERMINING EXPLICITLY THE COEFFICIENTS 
OF THE CHARACTERISTIC EQUATION 
By P, a. SamueIaSo.n 
Massachusetta Instilule of 7'cchnologt/ 

1. Introduction, When an inve-sligator is intorestod in all of the latent roots 
of the eharaetcristic equation of a matrix and not ui its latent vectors, it is 
sometimes desirable to expand out the delorminenlal equation in order to de¬ 
termine explicitly the polynonual eooflicients {pi ,jh, • • • , p„) in the expression 

(1) D{X) = I XZ — a I = X" + piX" ^ 4" ■ * • + Pn-iX + Pit • 

This can be done in a variety of ways, all of wliich are necessarily somewhat 
tedious for high order matrices, hlxcept for sign the coefficients an* respectively 
the sum of o’r principal minors of a given onhn'. These can be computed 
efficiently by “pivotal” methods [1], Alternatively through the utilization of 
the Cayley-Hamiltou theorem, \vher(‘hy a matrix satisfies its own eharacteristic 
eciualion, the p’s appear as the scihition of ti linear equations [2, 3], In a third 
method Horst has employed Newton’s formula eoncerning the powers of roots 
to derive, the p’s as the .solution of a triangular .set of etpiations, the eoeffieients 
of the latter only being attained after eomsiderable matrix multiiilicalion [4]. 
A fourth method suggested to me by Professor H. Bright Wilson, Jr. of Harvard 
University, consists of evaluating D(X) for n values of X, presumably hy efficient 
“Doolittle" methods; to these n points, Lagrange’s interpolation formula is 
applied to determine the n coeffieienta explicitly. 


2. The Hew Method, The present paper describes a new computational 
method based upon well-known dynamical considerations. A single nth order 
differential equation can be converted into "normal” form, involving n fimt order 
differential equations. This is easily done by defining appropriate new variables. 
If the original nth order differential equation is written as 

(2) + • ■ • + p„-ira) + pn = 0, 

then the new normal system can be written as 


(3) 


X'iit) (f-l, •••n) 

X 


whtjre 


(4) 




01 0 0 
00 1 0 


0 0 0 ••• 1 
-Pn ~Pn~i -Pn-i ~Pl 


is the so-called companion matrix to the polynomial in question. 

424 




CHARACTERISTIC EQUATION 


425 


The rcycise process of going from a normal system in many variables to a 
single high order equation is not so simple. Yet it can be done, and in so doing 
we attain the required polynomial coefficients [5], If 

x'{t) = ax{t) 

repiesents the normal system in matrix form, then symbolically 

(6) Z)(i) xm = Z!->(0 + + ... + + p., 

Because we wish to find out the expanded form of Z)(X), this relationship is of 
no use to us. Since similar matrices have the same characteristic equation, 
ours is the problem of finding a non-singular matrix C, such that 

(7) Cr^aC = h, 

where h is of the form given in equation (4). 

This problem can be approached from an elementary algebraic viewpoint. 
The relationships in (5) represent n linear equations between 2n variables, 
[-Yi(i), Yj(0) ' • ' ) Xn{t), Xi(i), X 2 (i), • • , Y„(i)]. These are not sufficient to 
eliminate the 2(a — 1) variables not involving the subscript 1. However, inas¬ 
much as (5) holds for all values of t we may differentiate it repeatedly until we 
finally have the system of equations 


+ a„iZ{"-" -h • • • -h = 0 

_Z(n-i) ^ aiiZ}”'^’ + • • • + ainZi’-**’ = 0 

( 8 ) . 

-Zi"-^' -f- a„tXi’'-^^ + • • ■ -f a„„Xi”-^^ = 0 

~Xi + • • • UiiZi -!-•••+ UlnZn = 0 


— Xn -f- OttiZi -j- a„„X„ = 0 

These are linear equations in n* -f- « variables. We wish to eliminate all 
I'ariablcR which have a subscript other than one'; namely, (Zj, • • ■ , Z„ , 
Z^ , . ■ • , Z!, , • • ■ , Zi"’, • ■ • ZS,"^). These are (n + l)(n - 1) = n'* - 1 in 
number. We may utilize all but one of the equations to perform this elimina¬ 
tion. The remaining equation after substitution will be the desired high order 
equation, and its coefficients are the polynomial coefficients. 

Ordinarily one would solve all but one of the equations for the values of the 
variables to be eliminated. These would then be substituted into the remaining 
equation. Actually from the computational standpoint it is unnecessary to 
solve completely for any unknowns. The so-called “forward” solution of the 
usual Gausa-Doolittle technique automatically performs the elimination or 






426 


P. A. SAMUEI<SON 


suV)stitiition, without nccosHary rcroursc to a “hat'k” solution for the values of 
the eliminated variables. Thc.se values arc in any ease of no interest 
There is no unique order in which the eciuations must be. reduced. Indeed, 
when one order fails becau.se a leading inineipal minor vani.she.s, we may .switch 
to another. A .suggested eonvenieut order is given below. Let 


an ! 

0)5 • 

■ ■ Oln” 

051 1 

055 ■ 

■ ■ Osn 

1 

J^n\ 1 

0„s ' 

' 0„n_ 


cm 

S 


li 


I = (5„) : 


(Li = 1. 


n 


1 ) 


Then, consider the partitioned matrix: 


(9) If = 


-7 

M 

0 

0 

0 

0 

~s 

0 

0 

0 

-I 

jI7 • • ■ 

0 

0 

0 

0 

- ,S’ ... 

0 

0 

0 

Q ••• 

-7 

M 

0 

0 

0 


0 

0 

0 ... 

0 

R 

0 

0 

0 

— Oil 

0 

0 

0 ... 

R 

0 

0 

0 

0 

0 

0 

0 

R •• 

0 

0 

0 

1 

— Oil • ■ ‘ 

0 

0 

R 

0 •• • 

0 

0 

1 

— Oil 

0 

0 


It is simply the matrix of the equation.s in (8) with the variables 
(iCi, , • • • , Xi"’) shifted over to the right-hand side, and with the equations 

in which the variable one leads off being placed at the bottom. 

If the usual "forward” Doolittle technique Ls followed, then the final elements 
computed, corresponding to the elements in the lower right-hand liox, are the 
coefficients (1, pi, pi, ‘ > Pn). It is the present writer’s experience that the 

Grout form [6], like Dwyer’s [7] the last word in Doolittle abbreviation, hs to be 
recommended, particularly since we are dealing with an a.symmotrical matrix. 
A clerk masters its ritual in a few minutes, and the speeds achieved once the 
operations become mechanical are impressive. 

Tor the trivial case of determining the coefficients corresponding to a two by 
two matrix the If matrix is of the form 


( 10 ) 


-1 

055 

0 

0 

— Osi 

0 

0 

-1 

055 

0 

0 

— Osi 

0 

0 

Ol5 

0 

1 

— Oil 

0 

Ol5 

0 

1 

-Ou 



The Auxiliary Grout matrix becomes 


(11) 


-1 

025 

0 

0 

-021 

0 

0 

-1 

025 

0 

0 

— Osi 

0 

0 

Oi2 

0 

1 

— Oil 

0 

— Ol2 

025 

1 ■ 

(—Oil — an) 

( — OjsOji -|- OiiOss) 










characteristic equation 


427 


The answer m the lower right-hand box will immediately be recognized aa the 
correct one. I have found it convenient to vary the precise Grout routine by 
dividing vertical columns by the “leading” diagonal element, rather than 
honzontal columns This is a matter of indifference and saves some computa¬ 
tions. As in the higher order cases, the presence of the identity matrix along 
the diagonal reduces moat of the computations to mere copying, Actually the 
intelligent computer will soon notice that most of the copying may be eliminated 
since the numbers in question are to be added in later in other sums of products. 
After eliminating unknowns corresponding to the equations above the line on 
which (9) is written, there results the system 


~R 

0 

0 

0 

0 

• 0 

0 

1 

— On 

BM 

0 

0 

0 

0 

• 0 

1 

— On 

-RS 

RM^ 

0 

0 

0 

0 

• 1 

—On 

-RS 

-RMS 

_ RM”-^ 

1 

-an 

-RS 

-RMS • 

. . 

.. . 

. . 

-RM”-^S_ 


Thus, it would be simpler to start from this stage, avoiding unnecessary copying. 

This remark shows that the present method is related to the Cayley-Hamilton 
methods described in [2] and [3], since the above set is derivable from the set 


r / 
Cl 

A" 

1 

0 

0 •• 

• o" 

t 

Cl 

A* 

0 

1 

0 •• 

. 0 

f 

Cl 

A* 

0 

0 

1 •• 

• 0 

t 

- 

A" 

0 

0 

0 • 



The last named set appears in the Cayley-Hamilton method when the first row 
of the powers of the original matrix are used in setting up n equations to deter¬ 
mine our n unknowns. Although related, the two methods are distinct since 
in the Cayley-Hamilton method one would arrive at a different set of equations 
after straightforward elimination of one variable, and since it would be shorter 
to dispense with the identity matrix used in the Aitken method in favor of the 
solution of a single set of equations by the usual Doolittle “back-solution. 

The reader will easily sec how the method may bo modified to handle the more 
general case of determining the coefficients of 

(14) D(X) = I cX -h o I =0) 

where c and a are any matrices. The method also can be used to reduce a 
polynomial equation involving a determinant of the nth order, each of whose 
coefficients are of a given de^ee in X, to a lower order detennmant whose coef¬ 
ficients are of higher degree in X. 



428 


r. A. SAMUKLHON 


The jireseiit method deiivi'H the 7 )'k as the algebraic solution of high order 
linear eciuations. It would therefore seem inferior to thos(! methods which need 
only solve a system of n eiiuations. However, two remarks are iu order. The 
matrix of the high ordc'r system ean he written down immc'diately without 
computation. Furthcu'iuore, moat of the <*lemeuta in thi' matrix are zeros, so 
that a mere counting of the eciuations is not a true indication of the labor in¬ 
volved, 

3. Some comparsions between present method and other methods. Within 
the brief compa-ss of the present work it is not poasililc to give an exhaustive 
appraisal of the comparative eomputational efficieticies of the mt'thods men¬ 
tioned. In general, a computing method i.s to lie judged in terms of the number 
of multiplications that, it involves, although ollu'r eonsldwations such as the 
number of additions, the magnitude and .sign of the numbers handled, the 
repetitivene,«H of the operations involved, the adaptability to punch card ma- 
ehinery, etc. are modifying faetons. In (his di.sen,s.sion the power of a method 
will be taken to be an inverse function of the mnnher of imiltiplieations that it 
invoh'cs. 

It may be said fimt of all that ina.smueh os the minimum number of multi¬ 
plications involved in computing an nth order determinant is of the order of 
n^ even with the most efficient “pivotal” methods, direct computation of the 
coefficients by principal minors involves, for sufficiently large n, computation 
of the order of n\ The same i.s true of the Wilson method described above. 
The Horst method, and any other that recpiires the c.xplicit n powers of an nth 
order matrix, also asymptotically requires multiplications of the order of n*. 
This does not mean that the above three methods are equally powerful for .small 
n, nor even asymptotically, since the coefficients of the term in the formula 
for the requi-sitc number of multiplications may not be equal. In fact, Riersol 
[1] has .shown that his method is better than Horst’s for small n, but asympto¬ 
tically le.sH powerful. 

It can also be shown that the Cayley-Harailton methods which simply involve 
products of the powers of a matrix with row or column vectors are asymptotically 
more powerful than any of the above methods, the work only increasing as the 
cube of n. This is true whether the longer Aitkcn form of reduction is em¬ 
ployed or whether the usual Doolittle back-solution is followed. The present 
method is also an efficient one in the sense that its requisite number of multi¬ 
plications increases with the cube of n. For small values of n and asymptotically 
it can be shown to be more powerful than the Cayley-Harailton method which 
uses the Aitkcn method of reduction, although in the limit as n becomes large 
the ratio of the powers of the two methods approaches unity. 

It is of the greatest interest to compare the pow'er of the new method with the 
shorter Doolittle C-H method. It can easily be shown that the coefficients of in 
the expressions giving the respective requisite number of multiplications differ 
in such a way as to make the C-H method more powerful after some value of n, 



CHAEACTEEISTIC EQUATION 


429 


the ratio of the respective powers approaching the limit 8/9 However, for 
low order matrices the new method is the more powerful The reader may 
easily verify this for the case of a second order matrix. Below a sixth order 
matrix the present method seems to involve the smaller number of multiplica¬ 
tions. For a sixth order matrix the two methods seem to involve the same num¬ 
ber of multiplications (multiplications by unity not being counted). For 
matrices of the seventh order or higher the C-H method seems to be optimal 
As compared to an explicit evaluation of the coefficients by a straightforward 
computation of principal minors according to the fundamental definition of a 
determinant as the sum of signed products of elements, all of the methods 
discussed are efficient, since the work in the former increases faster than any 
power of n. However, for each of the methods discussed, in singular cases the 
method of reduction may fail so that modified procedures will be necessary. In 
actual practice such singularities will “almost never” be encountered But m 
the neighborhood of such singular points the computations become extremely 
sensitive to any rounding off of digits. Consequently, it is from the nature of 
the case impossible ever to develop exact rules for the maximum error involved 
in any ^ven calculation. 

REFEEENCES 

[1] 0. lliEHSOL, "Recurrent Computations of all Principal Minors of a Determinant," 

Annalt of Malh, Stal,, Vol. 11 (1940), pp, 193-198. 

[2] R. A. Fbaber, W. J. Dunc/ln, and A. R. Codlab, Elementary Malrices, pp. 141-142. 

[3] M. M. Flood, “A Computational Procedure for the Method of Principal Components," 

Psychotnelrika, Vol, 6 (1940), pp. 169-172. _ ^ 

[4] P, Horbt, "A Method for Determining the CoefficientB of a Characteristic .Equation," 

Annah of Malh. Slal., Vol. 5 (1936), pp, 83-84. 

[6] F. R. Moulton, Difemlial EquaUons, pp. 6-9 

6 P. D. Cbodt, "A Short Method for Evaluating Determinants and Solving Systems of 
Linear Equations with Real or Complex Coefficients," American IntiMe of 

Electrical Engineers, Vol, 60 (1941). /min 

[7] P, 8. Dwyer, “The Solution of Simultaneous Equations,” Paychomeirika, Vol. o (laiij, 

pp. 101-129 



NOTES 

This section is devoted to brief research and expository articles, notes on methodology 
and other short items. 


A NOTE ON THE THEORY OF MOMENT GENERATING FUNCTIONS 


By J. H, Curtiss 
Cornell University 


Let be a one-dimenaional variate and let F(x) be its distribution function/ 
The function 


G(«) = 

J—ge 


e" dF(x), 


a real, 


in which the integral is assumed to converge for a in some, neighborhood of the 
origin, is called the moment generating function of X. In dealing with certain 
distribution problems, this function has been widely used by statisticians, and 
especially by the English writers, in place of the closely-related characteristic 
function f(i) — E(c''^). It is known that a chainctoristic function uniciuely 
determines the corresponding distribution, and that if a sequence, of character¬ 
istic functions approacluvs a limit, the corresponding sequence of distribution 
functions does likewise. (These results are. more accurately stated below.) The 
appropriate analogues for the moment generating function of these theorems are 
apparently not too readily accessible in the literature, if they have been treated 
at all, and it seems worthwhile to record them in thi.s note. 

Henceforth we abbreviate distribution function to d.f., moment generating 
function to m.g.f., and characteristic function to c.f. The variables a and t will 
always be real, in contradistinction to the complex variable s, to be introduced 
in the next paragraph. 

The uniqueness property of the cf. may be stated as follows: If Fi(x) and 
fi(i) are the d.f. and c.f, of one variate, and F!(x) and fsit) are those of another, 
and if fi(i) s /^(t) for all“ t, then Fi(a:) = Fsfi) for all i [1, p 28]. To study the 
corresponding situation for the m.g.f., we first observe that 

ip(s) = E(e^^) = / e** dF{x), a complex, 


I Or cumulative frequency function; our notation and terminology are uniform with 
that of [1] except tor the use of the term “variate” instead of “random variable,” 

• It is possible for two non-idontical distributions to have c.f.’a which are identical 
throughout an interval of values of I containing the origin; an example is given in [4], p 100. 
The author is obliged to Professor Wintner and Professor Feller for pointing out the exist¬ 
ence of this particular example 


430 



MOMENT GBNEBATING FUNCTIONS 


431 


is a bilateral Laplace-Stieltjes transform. If such a transform exists for real 
values of s in an interval —ai<s < ai, ai > 0, it must exist for all complex 
values of s in the strip — ai < Sis < ai, and represent there an analytic func¬ 
tion of s [5, p, 238]. Evidently <fl(a) = G(a), (p{it) = f{t). Suppose now that 
Fiix), Glia), flit), are the d.f., m.g.f., and c.f. of a variate Zi, and F^ix), Giia), 
flit), are those of Z,. Let ^i(s) = Z(e*^‘), .^^(s) = s complex If 

Glia) s Giia) for all a in some interval, however small, containing the origin, 
then by a familiar property of analytic functions [2, p. 116], vi(s) = ^= 2 (s) 
throughout the corresponding atrip of analyticity, and so on the axis of imagi- 
naries. This means that/i(0 s fi{t), all t, and therefore Fiix) = Fiix). We 
have: 

Theorem 1. A m.g.f. existing in some neighborhood of a = 0 uniquely deter¬ 
mines the corresponding distribution. 

We turn now to distributions of variable form. Because certain of the ver¬ 
sions to be found in the literature are incomplete, it seems worth while to give 
here a full statement of the basic limit theorem for sequences of c.f.'s, due to 
P. L6vy and sometimes called Levy’s Continuity Theorem [4, pp. 48-50]. 

Theorem 2. Lei the distribution of a variate Xn depend on a parameter n, and 
let Fnix) and f„it) be the d f. and c.f. of Z„ . 

(a) If there exists a variate X with d.f. Fix) such that lim„..,„ E„(x) = Fix) at 
every continuity point of Fix), then lim„_,„/„(t) = fit) uniformly in each finite 
interval on the t~axis, where fit) is the c.f. of X. 

(b) If there exists a function fit) such that lim„_„/„(f) = fit), all t,’ and uni¬ 
formly* in some open interval containing the origin, then there exists a variate X 
withd.f. Fix) such that limn-,«, F„(x) = Fix) at each continuity point and uniformly 
in any finite or infinite interval of continuity of Fix). The cf. of X is fit), and 
lira„*,M/„(0 = fit) uniformly in each finite interval. 

We now develop the corresponding theorem for the m.g.f In the first plade, 
it is not difficult to see that part (a) will have no direct analogue, even if we add 
to the hypothesis the conditions that the m.g.f. of Z„ exists in some fixed interval 
for all n and that the m.g.f. of Z also exists in some interval. For example, 
the d.f. 

{ 0, a: < —n 

"H fcn arc tan nx, —n g x < n 

1, a: ^ n 


‘ The condition that lim„_./n(0 exiat on at least an everywhere dense set of points on the 
t-axia is esaential to the proof as given in Cramer’s book [1, pp. 29-30], but la omitted in his 
statement of the theorem, and is not stated clearly in certain other treatments by other 


authors. . 

‘For a discussion of this uniformity condition, and possible alternatives, see [1, p. 29 
(footnote)]. The condition may, for instance, be replaced by the assumption that /«) 
is continuous at f => 0. 



432 


J. H. CXJllTIBS 


whore fcn 


1/(2 arc tan a“), cloarly tends as n «> to the d.f. 


F{x) 


0,x <0 

1, X ^ 0 


at all points of continuity of the latter d.f- The m.g.f. corresponding to 
F,{x) is 

G„(a) = rk„e" , v" ^ ^dx, 

J-n 1 + n-z^ 


which for each n exists for all a, and the m g.f. corre.sponding to F(x) i.s simply 
the constant 1. Clearly 



n 

1 + 


dx, 


and from this it can easily be verified that lim„-„ (?„(q:) = «; if a 0. In 
short, mere convergence of a sequence of d.f.*s tells little about the behavior of 
the corresponding sequence of m.g.f.'s. 

Part (b) assumes the following form: 

Theorem 3. Lei Fn(x) and O^ia) he respectively the d.f. and m g.f. of a van- 
ale Xn. If Gn{a) eztsls /or 1 a I < «i and for all n ^ m , and if there, exists a 
finite-valued function 0(a) defined for 1 a 1 ^ 0(2 < ai, as > 0, such that lim„-,* 
(]?„(a) = 0(a), I a I g aa, then there exists a variate X with d.f, F(x) such that 
lLmn-« Fn(») = F(x) at each continuity point and uniformly in each finite or 
infinite interval of continuity of F(x). The m.g.f. of X exists for | a | g aj and 
is equal to 0(a) in that interval. 

To prove the theorem, we introduce the Laplace transform ip„(s) = E(b’^'') 
and observe that ( vj„(s) | g <pn(a) = 0„(a), s = « + t{, n ^ n® , for any s in 
the strip ~ai < £Rs < ai. By applying Leibniz’s mle for differentiation under 
an integral sign (extended to Stieltjes integrals), we find [5, p. 240] that 

(?"(«)= f^xV*dF„(x), |a|<ai, 

•^00 

from which it appears that (?„(«) > 0, | a | < ai. This means that the function 
Gn(a) assumes its maximum value in the interval ( « j g aj at either or both 
endpoints of the interval. But of course On(ai) and G„( — « 3 ) Imth approach 
finite limits as n becomes infinite, so it follows that the seciuencc lG'„(a)), 
n & rio, is uniformly bounded in the interval | a [ ^ a 2 . Thus the sequence 
{I v’n(s) 1), n g n®, is uniformly bounded in the strip — 02 S fils S oti , and 
moreover has a limit at each point of an infinite set possessing a limit point in 
the strip (i.e., at each point of the interval — a® g s ^ as). So by Vitali’s 
Theorem [3, pp. 156-160, 240], there exists an analytic function (p*(8) such that 
lim„_.^ V5„(s) = v>*(8) uniformly in each bounded closed subregion of the strip 
—a 2 < 91s < as. Since <fln(it) is the c.f of , the existence of the limiting 
distribution follows from Theorem 2(b). 



MOMENT QENEHA.TING FUNCTIONS 


433 


of coursu, ““ 0(cc), —at < a < at. It remains to show that tp*{a) 

is th(* m.R.f. <4 Theorem 2(b) statea that ip*{ii) is the c.f. of X. If we can 

show that the ftmcticm <fi(s) — Ji'(e**) exists at least in the atrip —02 < 9ls < 0 : 2 , 
then since on the axis of iraaginaries, the equality must be valid 

in the entire strip, and so in particular on the interval of the real axis inside 
the strip. 

It will suffice for this purpose to show that ip{a) exists for — a 2 g « ^ aa- 
Suppose in<hH*d that <p(a) does not exist at some point a = aa in this interval. 
That means that if 

il/ = [l.u.b. G„{a»), n ^ no], 
we can find n real number A such that 

(1) > ilf. 


But 

dF(.x) ** jf e“** dFnix) + dF(x) — ® ** 

Since liin;,-.« F„tx) « F(x) at all continuity points of F{x), and so on an every- 
Wlierc denae set of points, the Hclly-Bray Theorem [5, p. 31] states that the 
expression in brackets in (2) approaches zero as n becomes infinite. Meanwhile 

e"‘* dFnix) g e"'* dFnix) SM, n ^ no. 

Thus W(! arrive at the conclusion that the left member of (2) must be less than 

or equal to M, which eontrailicts (1). , , ^. ,. */ n nt \ 

To be sure, we have onlv proved that the m.g.f. of X is equal to p (a) or (j{a) 
in the open inUTv-al -a, < a < a,. and not in the corresponding closed mterval, 
as promiaed. But because of the absolute {and therefore umfom) convergence 
of the integrals defining OnM and p(a), these function must be contmuous m 

the closed interval ~ofj ^ « nS oi * Since lim„-,« (h^) _ v 

this interval, (/(a) must also be continuous there. This imphes that v>(“), tl^e 
ra,g.f. of X, is identically equal to 0(a) in the closed mterval, and the proof is 

“h'’ta‘^rhap. worth whUo to potat opt expUci% that in the courae of the 
forettoinir arguroent we have proved this proposition, , • 

„ - 0, iA«n >1 mwl cmuertt iinlfmly tn mry doitd imbinlermi ot Hit opm 
inUrtial, and (he UinUfunctim u itself a m.g.f. 

references 

(11 H. CaxMm. Ratdhm Variable and ProbabUUy 

(21 D. R. C 0 BW 8 S, Funcimt of a Complex Vanable, Chicago, 9 . 

[8] B-SertM, Oxf^, 1981- ^ 

(41 P. Livy, TMorie do VAddition dot 7orwW« Pans, 

(61 D. V. WiBDEB, The Laplace Trantform, Princeton, 19 1 . 



434 


ABRAHAM WALD 


ON THE POWER FUNCTION OF THE ANALYSIS OF VARIANCE TEST' 

By Abraham Wald 
Columbia University 


It is known" that the general problem of the analysis of variance can be re¬ 
duced by an orthogonal transformation to the following canonical form: Let the 
variates Vi, ■ ■ ■ ,yp, Zi, ■ ■■ ,Zn be independently and normally distributed 
with a common unknown variance <r". The mean values of Zi, ■ • • , z„ are known 
to be zero, and the mean values Vi., • • ■ , Vp oi the variates 2/1 , • ■ ■ , z/p are 
unknown. The canonical fonn of the analysis of variance test is the test of the 
hypothesis that 


( 1 ) 


7)1 = 7)2 = • • • = 77r — 0 


(?• < P) 


where a single observation is made on each of the variates yi, • • • ,yp, 

Zl , ' • ‘ , Zn . 

In the theory of the analysis of variance the test of the hypothesis (1) is 
based on the critical region 


( 2 ) 


4 + 


+ 


where the constant c is chosen so that the size of the critical region is equal to 
the level of significance a wo wish to have. The critical region (2) is identical 
with the critical region 


(3) 


yl 


+ 


-byr 


I I 

y\ + 


+ J/? + Z? + 




> c' = 


c 

r+T 


It is known that the power function of the critical region (3) depends only on 
the single parameter 

(4) X = 2 7) • . 

Denote the power function of the critical region (3) by i3Q(X). P. L. Hsu has 
proved' the following optimum property of the region (3): Let W he a critical 
region which satisfies the follovnng two conditions'. 

(a) The size of W is equal to the size of the region (3). 


7 Presented at a joint meeting of the Institute of Mathematical Statistics and the Ameri¬ 
can Mathematical Society in New York, December, 1911. 

See for instance P, C. Tang, "The power function of the analysis of variance tests,” 
Slat. Rea. Mem., Vol, 2, 1938. 

’ P. L. Hsn, "Analysis of variance from the power function standpoint,” Biometnka, 
January, 1941, 



A^^ALYSIS OP VARIANCE 


435 


(Ii) The jmrrr Suncliun oj W dcpctuls on the single parameter X. 

Then /J(X) < ^o(A) iHurc ^(X) denotes the power function of W. 

Condition (h) Is a soriems rratriotion in Hhu’s result. In this paper we shall 
prove an optinnnn proiierty of ft(X) where )9o(X) is compared with the power 
function of any other erilical region of size equal to that of (3). 

For any given vnUu>« i, • ■ • , , cr' and X denote by SU+i, ■ ■ ■ , v,, a',\) 

the sphere tlefmtHi liy the equations 


(5) i?f-f ■ • • + nJ “ ’). = = r + 1, ■ • • , p); a = a'. 


I'or any region IF denote by fiwirii, • • ■ , 7 /^, 0 -) the power function of IF,i.e. 
wirii , ■ • • , rjp , a) denotes the probability that the sample point will fall 
within IF ralculatcKl under the assumption that ?;i, ■ ■ ■ , rip and tr are the true 
value.s of the parameters. We. will denote by , • • ■ , Pp , A) the in¬ 
tegral of .the power function ff') over the surface 

i j, • ■ • , »?p , cr', X) dividerl by the area of Siri'r+i, ■ • • , i/p , ff', X), i.e, 


ywiv'r 


^ 11 


X) 


( 0 ) 


L 




..1-7 

J •'JfW + l.-'Mjhl'.X) 




Vp > 


r')dA. 


We will prove the following 

Thkoiiem: If IF is a erilical region of size equal to that of (3), i.e. 
0w{O, • • ■ , 0, nr 1 1 . • • • . *;e . O') = /9n(0), then 


(7) 


ywiv'r 


r+1 I 


, i)p , O', X) ^ ^d(X) 


for arbitrary ealucs rjUt, • *' 1 Vr . o' and X, 

If IF satisfies Hsu’s condition (6) then the power function Pwivi > • •' > Vp , <r) 
is constant on the surface S(ijr+i, • • , np > <r, X) and therefore 

ytrCvrn , • " , Vp , <r, b) ^ ,Vp,o)- Hence Hsu’s result is an imme¬ 
diate conHCiqucnce of our Theorem._ 

Denote 1 Vy? + ■ •'•> y? + 2 ? + • • • + z\ \ by t and for any values 
Sr+i, dp ,b lot R(,ar n , •' < , Up , 6) be the set of all sample points for which 


y. = o,(t = r -f 1, • • • , p) and t = h. ^ 

For any region IF of the sample space we denote by Wiyr+i, ■ ■ ■ , j/p , 0 the 
common part of IF and Rivrs-i, ‘ Pp 1 1)- 
In order to prove our Theorem we first show the validity of the following 
Lemma 1: For any critical region Z there exists a function ipzipr+i >■" 7 Vpi ^) 
of the mriahka j/ri 1 , ■ • • , 1/p , f such that the critical region Z* defined hy the in¬ 
equality 

|/l ■4“ ' * ‘ ”h 1/r ^ ‘Pzipr+i I ' ' ■ ) 2/p > 0 


satisfies the following two conditions •. 

(a) /3z(0, • • • , 0, 7Jr+l , • • • , I?p , <r) = PrAO, ,0, Vr+I , ■■■ ,rip,<r)] 



436 


ABIIAHAM WALD 


(b) yxiVHi , , Vp, ^ JZ'iVr^l , ' • ■ , Vp , 

Proof. Denote by Pz(?/r+i , ■ • , , 0 the conditional probability of 

ZiVr+i 1 ■ ■ • , 2/j.. 0 calculated under the condition that the sample point lies 
in Rivr-n, ■ • ■ iVp )t) and under the, assumption that iji = •.•== = 0 

Denote by P(d, t) the conditional probability that 

j/i + • • • + J/r > d 


calculated under the condition that the sample point lies in R{yr+i, • ‘, Vp , 1) 
and under the assumption that = > • • = i)r = 0. It is easy to verify that the 
values of F{d, 1) and PpXVr+i > • • • . 1/?. 0 do not depend on the unknown 
parameters , • • • , n? > o'. Since F{d, t) is a continuous function of d and 

since F{1^, t) — 0, there exists a function v3*(i/r i-i > • • * , J/p , 0 such that 


PiyJzCl/r+l ) ' ‘ ■ 1 J/p J 0> t] ~ PzCl/r+l I ■ ‘ ‘ ) Up I t). I 

For this function (pziyr+i , ■ • ■ ,yp,i) the region Z* certainly satisfies nondition 
(a) of Lemma 1, Wo, will show that condition (b) is also satisfied. Consider 
the ratio 


I 


( 8 ) 


r 

exp 


exp 


2<r»(S^' 


S S *“] 


+ 22 ivi — vif ■+ 


Denote 


\ y i-i 


'"'I 


by r„ . Then we have 


(9) 


I 


2) 


a(Pr+l' ■■■IpiP’iW 


dA = / 


.. . 




21 VVid'’ 

e*"* dA. 


(?r+l •"■.Ipi'.W 


dA, 


where a(f/) denotes the angle (0 < «(*>) < tt) between the vector y with the 
components yi, ■ ■ ■ , j/, and the vector ij with the components iji, • • • , nr ■ 
Because of the symmetry of the sphere, the value of the right hand side of (9) 
is not changed if we substitute j3(i}) for a(ij) where Pin) denotes the 
angle (0 < 0in) iS f) between the vector n and an arbitrarily chosen fixed vector 
u. Hence the value of the right hand side of (9) depends only on ry , i.e. 


r ry CO# 1“ £#) l/» 

Now we will show that I (r„) is a monotonically increasing function of Vy . We 
have 


I 


ru «># lfi(l)]/'r 


ZClr+l"' 


dA ~ I(ry). 



ANALYSIS OF VARIANCE 


437 


( 11 ) 


dliry) _ Vx r 


cos [j3(,)] 


ry coa [/3(>i) ]/tf 


dA. 


Doiiotu by wi the subset of Sirjr+i , • • • , jjp , d, X) in which 0 < < - and by 

2 

wi the subst't in wliioh - < fi(ti) < tt. Because of the symmetry of the sphere 
wc oljviously have 


( 12 ) 


f OOH dA = f cos [tt - /3(7,)]e^’' “““ 

•'«* Jfcil 

J u. 


Hence 


(13) 


d7(rj _ Vx 


dry 


—- f cos D3(i,)]{V^dd. 

O' 


The riglit hand side of (13) is positive. Hence /(r„), and therefore also the left 
hand aide uf (8), is a monotomcally increasing function of . 

Let Pi(y',.ii, • • • , y'p , t', m , •' • , Vp > <^) dyr+i ■ • ■ dyp dt be the probability 
tliat the sample point will fall in the intersection of Z and the set 

y[ ~ \dyy < yi < y'i A- \ dy,{i = r + 1, • • , p), V - ^ di < t < i' A- hdt 

Himilarly let Piiy'r^x, • • • , y'p , I', vi i • • • , Vp , o^) dvr+i • • dyp dt be the un¬ 
conditional probability that the sample point will fall in the intersection of Z* 
and the set 


y[ ~ h dVi < I/I < J/J + i dyi{i = r A- I, ■ • • ,v), t' - h dt < t < i' A-h dL 

Since the function ¥»a(l/r+i ■ • ‘ • tVp ,t) has been defined so that 

P z(|/r+l j ■ ' ' I 2/p I 0 ~ f^[v’(2/r+l ) ■ ' ■ ) 2/p ) 0) 


we obviously have 


(14) 


PliVr+l » ' ‘ ' I 2/p I L df ' ' ' > l/i’+l > ‘ ) IJp ) ®') 

“ PiiVr+l ) ■ ’ ’ I 2/p ( b, • ■ ■ , 0, j)r+l )■■■)*??) O')' 


ITsing a lemma* by Neyman and Pcamon, we easily obtain 


(15) 


[ PiiVr+i, ,yp,t, VI, ■■■ ,Vp,(r)dA 


>-/. 




Pi(yr+ 1 , ■ ■ ■ , Vp > Vi , Vp ) O') dA 


*J. Nbykan and E. S, Pearson, "Contributions to the theory of testing statistical 
hypotheses," Slat, Res Mem., Vol. 1, tiondon, 1936. 



438 


ABRAHAM WALD 


from (14) ajid tho fact that the; loft hand sale of (8) is a monotonically increaaing 
function of ?^ = ?/i + • ■ ■ + 2 /?. Condition (b) is an immediate consequence 
of (15). Honco Lemma 1 is proved. 

For the pi oof of our theorem wo will also need the following 

Lemma 2: Let n, ■ ■ ■ , he k nnrmally and independently distributed variates 
with a common variance Denote the mean value of V{ by a,(t = I, • • ■ , A:) and 
letfivi, ‘ , Vk, O') he a function of the varicd)les v\, • ■ ■ , Vk and er which does not 

involve the mean values on , • ■ • , . Then, if the expected value of f(vi c) 

IS equal to zero, f(vi, Vk, a) is identically equal to zero, except perhaps on a set 
of measure zero. 

Proof: Lemma 2 is obviously proved for all values of tr if we prove it for 
(T = 1. Hence we will assume that ir = 1. It Is known that a fc-variate distri¬ 
bution which has moments equal to those of the joint distribution of m 
must be identical with the joint distribution of wi, ■ • ■ , Vk. That is to say, the 
joint distribution of • , y* is uniquely determined by its moments. Hence if 


( 10 ) 








-IS (»<■ Ol)’ 

• • , Vk)c dvi ■ ■' dVk = 0 


for any set (fi, • • • , r*) of non-negative inh'gera, then p(yi, • • , e*) must be 
equal to zero except perhaps on a set of measure zero. Now lob/(ui, • ■ • , y*) 
bo a function whose expected value is zero, i.o. 

r'“ f-"” , , '-»S(n-<..)* 

(17) / ' • ■ / f{vi , • • • , y*)e cfoi • • • du* = 0 

J-flO */—so 


identically in ai, ■ ■ ■ , ak . From (17) it follows that 


(18) 


1'+" -»S.?+ S 

/ • ■ • / f(vi, - , Vkje -■ dvi ■ ’' dvk => 0 

J—M J—oa 


identically in ai, • ■ ■ , ai,. Differentiating the loft hand side of (18) ri times 
with respect to ai, rj times ivith respect to as, • • • , and r* times with respect to 
a*, we obtain 


(19) 


/ 4M yS® 

CO d—eo 


v'd 


vl*‘f(vi, 


Vk)e 




dvi ' • • dVk « 0. 


From (16) and (19) it follows that f{vi , “ • , y*) = 0. Hence Lemma 2 is 
proved. 

Using Lemmas 1 and 2 wo can cosily prove our theorem. Because of Lemma 1 
wo can restrict ourselves to critical regions W which are given by an inequality 
of the following type 

l/i -I- • ■' -f 2/r > vivt+i , • ■ • ,yp,t) 

where <p{yr+i, Vp , 1) is some function of Pt+i, • ■ ■ , pp and t. The above 
inequality can be written as 



ANALYSIS OF VARIANCE 


439 


( 20 ) 


2 I 

Vi + 


+ y; 


> 'PiVr. 


■+1 ! 


> Vp ! i)- 


For any given values of ijr+i t ,Vp,t denote by P( 2 /r+i, • • , i/j,, i) the 
conditional probability that (20) holds calculated under the assumption that 
~ = T], = 0, It is obvious that P(i/f+i, ■ ■ ■ , i/j , i) does not depend on 

the unknown parameters Vrn, • Vp > If we denote by W the critical 
region dcfinwl by the inequality (20), we have 

dir(O) ' ■ ‘ j fl) Vr+i ) ’ ‘ ) *!jj I <r) 

/ +flO M [to 

* I I ‘ ' y 1/p t OpiC^/r+l j ‘ ‘ j Z/p ) Vr-i-1 » ' * * j j 

M J—CQ Jo 


X Piii, O') di/r+i ’ ‘' di/p di 


where pi(yr+i, " • , Vp , Vm , ■ ■ ■ , Vp , <r) denotes the joint probability density 
function of j/r 4 i , ,yp and piit, a) denotes the probability density function 
of i calculated under the assumption that iji = • • ■ = ijr = 0. In order to 
satisfy the condition of our Theorem, the function ^ in (20) must be chosen so 
that 


/ * • • / I P(?/r+l , ■ tVp , 0pi(2/r+l , " ‘ , Vp > IJr+1 , " ‘ , Vp , 

X Piit, v) dyr+i "• dypdt = /3o(0). 


( 22 ) 


Let 

(23) j[ Fivni , ,yp, t)piii, v) dt = Qiyt+i, • • ■ , J/p , v). 
Then we obtain from (22) 

> * 4*40 M * 4*40 

(24) / • • • / QiVr+i , ,yp, ■■■ dyp = Paio) 


From (24) and Lemma 2 it follows that 
(25) QiVr+i , ■ ■ ■ . J/j> > 

except perhaps on a set of measure zero. From (23), (25) and a result by P. L, 
Hsu we obtain 

(20) Fiy,+i, ,yp,t) = ^“(O) 


except perhaps on a set of measure zero. 

It follows easily from (20) that Hy,+i, 
except perhaps on a set of measure zero. 


• • ■ ,yp,t)^^ equal to a fixed constant 
This proves our Theorem. 


« P. L. Hsu, “Notes on Hotelling’s generalised T>‘ AnnaU of Math. Stat, Vol. 9, p. 237 



440 


KDWAKD PAttli^ON* 


A NOTE ON THE ESTIMATION OF SOME MEAN VALUES FOR A 
BIVARIATE DISTRIBUTION 

By Edward Pai'LROn* 

Cdumbia t "nii'emity 

In thin paper two problemn are ciwenwHi whieh wfere siiRgwifefl by the theory 
of reprewntative ftomplinp; [1], l»nt which aloo orrur in wveml other fieklH. The 

Tfl 

firat problem is to set vip confidence limitw for — , the ratio of the mean valu^ 

wi» 

of tiie variates i and y. This eoinea up in the foltowing situation. Let a popu¬ 
lation IT conafet of N units xi , jj , • • • jcy and Kiippowe we wish to set up confi- 

H 

dence limits for the mean X ®= , Also assume the impulation tt ha' 

dividend into M Rroups, let v, be the numlter of individuals in the grot, 

Uj be the sum of the value's of x for the v, individuals in the j“' group, so X ~ 

’f.bi la , Now if a random sample of ?i out of the iVf groups is 

u, + cj • ■ • U.W a/m, 

taken, yielding olwetwations («i, m), (to , cj) • • > (tt„ , e,) and V is unknown, 
the determination of confidence limits for X clearly becomes a special case of 
the first problem. The distribution of a ratio, discussed by Cieary [2], does 
not seem to be. well ndapttKl for this purpose. 

The second problem, which is of greater practical interest, arises when we 
again have a random sample (ui , vi) • • • (u«, u„) of n out of M groups and N 
and M are known. The standard estimate of X that has usually been made 

n 

n observations on v can be used to increase the precision of the estimate of the 
numerator of X, This is a special case of problem 2, w’hich we can now formu¬ 
late as how to best estimate m„ (the mean value of a trait x) both by a point and 
by an interval, when for each unit in the sample observations both on x and 
on a correlated variate y are obtainable, and m, is known a priori. Situations 
of this type occur fairly often. It is possible to reduce the second problem to 

the first by using --m,, os the estiraato of m,, and by multiplying the confidence 

‘ttt 

limits for — by to secure limits for m, , but this will not usually be the most 

ITlry 

efficient procedure. 

In both problems two cases will be distinguished: (a) when al , and p are 
known a priori, and (b) when they are unknown. To determine confidence 


IS 




Ma . , 

• I where U 
N 


This estimate does not utilise the fact that the 


^ Work done under a grant-in-aid from the CameglQ Corporation of New York. 



NOTE ON ESTIMATION 


441 


7Yl 

limitH for -- , it will first be assumed that the probability density f{xy) of 
and y is 
/(x, y) 

r 


( 1 . 1 ) 


2ir£rx ff^-x/l — 


Denote the ratio — by K (assuming m„ 0), and suppose it is desired to test 

the hypothesis that K ~ Kn an the basis of a sample of n independent observa¬ 
tions (xi, J/i) • • ■ (Xn , J/n). 

2 - a* ar, — KVi and l = . Since z is a linear function of x and y it 

must bo normally distributed, and its mean value is obviously zero. Therefore 

^ * _ V n ji - Ky) 

(1.2) ^ T v’l 2 

V O'* — 2KpCz<^y -T K. (Ty 

will be normally distributed about zero with unit variance, and the hypothesis 

is rejected if i u(Ka) ] > u», where e"*'” di = W It is easy to show 

that this teat is equivalent to that based on the likelihood-ratio. 

Confidence limita for K would now be given by values of K satisfying the 

. /-* I 


inequality 


c. 


^ -Uo , provided they always constituted a closed non-empty 

interval. Thte is 'equivalent here to the requirement that X be a real valued 
monotonic function of u in the interval - » < u < « ; this requireinent is 
unfortunately never exactly fulRlled, as can be seen from the graph of (1.2) 

(in the u, K plane), for the curve has two horizontal asymptotes u - ± , 

and one maximum or minimum point ^unless P-^- However, K will 
always be a monotonic function of u in the interval -u^ < u < u* provided 
? > u„ . Since rny ^ 0, by taking n sufficiently large the probability 

I < u* can be made arbitrarily small. Moreover, for values of a 

\ be such that 

ordinarily used, in most practical problems the value of - 


(Ty 

that 


even for quite small samples the probability 




< Ua (that is, the proba- 



442 


EDWARD 1>AIU,S0N 


bility of gt'tliiig a bumplo for which the values of K that are aceepted will not 
form a real interval) will he quite nr-ghgible. For example, let a have the 

triy 
<r„ 

1(1, Prob. 

Suhjeet to these rather weak restrietions on the onler of magnitude of n and 


conventional value .05, and ,sui)pof'e 
C 1 ,c)cA < 10 ^ and for /i 


2; then for /i 
V 'y 


0, Prob. 

< i.ooy < Kr“. 


7n„ 

(Tl/ 


, the eonfidcnee limits for K are 


(1.3) 


{nfy — i/opo-xtrj ± "s/liix'y ~ ulptxxajf — (mf — — iiiirl) 


^2 2 2 
ny — Uaffy 


In case (h) when al, al, and p are unknown, each s, = x, ~ Ky, is .still 
normally and independently distributed with zero m(‘an and a eommon variance. 
It follows that 




i 


■\/nz 

f/t-f 


■\/n (3 ■- Ky) 

Vs® — 2r.'!i R„ K + /C“ 


will have ytuclents’ dLstril)ution with n — 1 degree.s of freedom. Subject to 
practically the .same restriction a.s b(*{ore, the conlidcnce limits for K as d(«ter- 
mined from (1.4) are 


(1.5) 


(nXy — drs,8„) ±‘V(nxy — — intf — 


s 


ny — ta s. 



where is the critical value of vStudents’ distribution (for n — 1 degrees of 


freedom) and s* = ~ ^ sample correlation 


between x and y. 

When the di.stribution of x and y deviate.s considerably from a bi\'ariate 
normal one, it would still appear that a.s a practical matter much the same 
methods could he used. The ba.sis for this is the fact that tliere i.s considerable 
experimental evidence [3], [4] to show that the distrilnition of the mean of a 
sample drawn from any population likely to lie eneounte.red in practise will 
approacli normality very rapidly even for n (piite small. Hence 2 and u can be 
regarded as normally distributed for « say >25, and tiie eonlidcnec limits for 

tit 

-* will then be given by (1.3); in ca.se (b) a somewhat lar.ger samide i-s required 

Ttly 

to dimmish the error in estimating a,. But for n say > 50, I will have a distri¬ 
bution close to normal and the confidence limits for K are given by (1.5) (with 
ta replaced by Ua) The statements for the non-normul case appear a.s a prac¬ 
tical matter to also hold when the sample is drawn from a finite population of N 



NOTK ON ESTIMATION 


443 


units without replacement i£ n is not too small, provided n is replaced by 

” (v - .0 ■ “ I (ff^) - w,/c + ,;k’] 

_ In th(‘ second problem we aRiiin start by assuming the distribution of a: and y is 
given by (1.1). I'or ca.se (a), niz i.s the only unknown parameter If P = 

- J T3 


I niz) and tfi - ^ , then 

1-.1 dirit 


dnit 

1 /2«(a:, nix) 2p 


2(1 P") t (Ti 

and the maximum likelihood estimate mi of is 


S(i/. - rriy)), 


( 1 . 6 ) 


= X - ^ (y - m^), 

<r„ 


wlu'rc (Tji, — pPzdv ■ Also mi is a sufficient statistic, and the confidence interval 
given by th(‘ set of I'alues of nix satisfying 



will lu' u “shurte.sl unliiaHcd confidence interval” in the sense of Neyman. 

(lasc! (h) will be more important, since the exact values of the variances and 
covariance will usually be unknown. By analogy with (1 6), a similar estimate 
of nix for this ca.se is 


(1.7) ihi = r — ^ (y - mj. 

This is precisely the least .square estimate of x, corresponding to , and 

has been uscsl for this problem before; for example, it is discussed by Cochran [5]. 
We shall discuss some additional aspects of the problem, and also mention the 
application to the .special ca.se of representative sampling by groups. 

When th(^ bivaiiate di.striliution of x and y is such that the conditional distri¬ 
bution of each x, is noimal with mean A -1- Byi and a common variance, then 
Professor Wald has suggested that exact confidence limits for nix for small 
sumple.s can h(‘ secured by using the standard methods of the theory of least 
scpiares. I'he resulting eoufidcncc limits are easily seen to be 


where 


ifii ± - j , 

■' V n 2 


1 

■Sw' 

_1 

"1 . (my - yf 

fn ^ 

L.-i J 

_ E {y. - y)\ 


X = (1 - r*) 



441 


KDWAUD PAULSON 


and ia is the critical valnc of Htudenta’ <li.striI)utioii with n ~ 2 degrees of 
freedom at a level of signiticaiice ~ a. 

The requirement that the regn‘s.sion of x on y he linear is rather stringent, 
although it may often he fulfilled, especially in the ease of representative sampling 
mentioned in the opening paragraph. When the regression of x on y is non¬ 
linear, the, e.stimate given hy (1.7) reipiires .some further ju.stiticatiou. Let 
U„ ~ E(,x'y‘), when; E denotes the mean value, and assume that we liavc n 
independent pairs of observations and that the moments lha , Un , Un , Uk , 
f/oa I lh<i, Um anti Un are, all finite. It then follows from a theorem of Doob [6] 
that V nirhi — m,) tends to a limiting distribution with increo-sing n which is 
normal with zero mean and variance equal to (ri(l — p“). 

The estimate £ is clearl.i' always less effieitmt than rhi unlp.s.H p = 0. The 

e.stimate ~-7ny is .known to have a large .sample variance 




iA , / mxV 2 

- ; ffiv + 1 - ) iZy 

n 

L \wi 

J yinj J 


So -‘rriy is always loss efficient than ihi unless == ? niy, at which point V 

y 

attains its minimum value. - ' - . In fact thu can he cosily shown to hove 

n 

an efficiency > any other statistic of the class Q, ^which ineludex X and | 

consisting of all statistics q satisfying two conditions; (1) that ■s/n{q — aii) 
have a distribution approaching normality with zero mean and finite variance 
0 -^ and (2) al be independent of the joint density function of x and y, involving 
only certain of the moments u,-,. A rather artificial member of the class Q is 

g = The proof consists merely in observing that 

log ftiy Sy 

if for any bivariate distribution a*, = {r*(l — p‘) > <i\, this would also have to 
be true when the distribution of x and y is a bivariate normal one, which is 
impossible, since cl{l — p*) is then the variance of V" n{ihi — m*), ihi being 
the maximum likelihood statistic. 

For moderate values of n, say n > 100, fairly exact confidence limits for m, 

will be given by rfit =b “ - s®(l — r*). When the sample is drawn from a 
■\/n 

finite population of N units without replacement, the confidence limits for 
n > 100 are ± /|/f 

In the, problem of e.stimating m,, = X for the population 11, duscussed in the 
opening paragraph, which consists of N individuals divided into M groups, on 
the basis of a random sample (ui, vi), (ue, Vi) ■ ■ • (u,,, v„) of n out of the N 



SIGNIFICANCE LEVELS 


445 


M 


groups, an efficient estimate will bc«i' = 


u — 


V — 


K 

M. 


Mu 


N 


The efficiency 


of ni' relative to the conventional estimate is (1 — p“„) 


would seem to he <ivute large, 


, which ordinarily 
This ia easily extended to the case n is divided 


into I strata with M , groiip.s comprising Ni individuals in the stratum, when 
a random samide of m, out of the groups m each stratum is taken. Let v,, 
be the number of individuals in the j‘’' group of the r"' stratum and Ui, denote 
the sum of th(i values of x for these v„ individuals The estimate of nix becomes 


m — 


iiM, 

■di - U - 

1-1 

L si, \ Mj_ 


N 


t 

If 23 m, “ m is fixed, the large sample variance of m” will be a minimum if m, 

tMl 

is proportional to il/foru,%/l — pf, where is the correlation between u and v 
in the stratum. 

In conclusion, tho writer wishes to thank Professor A. Wald for his advice 
and encouragement, and Mr. Henry Goldberg for several suggestions. 


IIEFERENCES 

(11 J. Nbkman, “On the two different aspects of tho representative method," Journal of 
the Iloyal Statistical Society, Vol. 07 (1034), pp. 568-606. 

[2] U. C. (lEARY, “The frequency distribution of the quotient of two normal variates," 
Roy. Slot. tS’oc. Jour., Vol. 93 (1980), pp, 442-446. 

[31 W. A. Shbwhart, Ecorwmtc CorUrol of Quality of Manufactured Product, New York, 
(1931), pp, 182-183. 

14] II. C. Carver, "Fundamontals in tho theory of sampling,” Annah of Math. Slat., 
Vol, 1 (1930), pp. 110-112, 

[6] W. G. Cochran, “Tho use of the analysis of variance in enumeration by sampling,” 
Jour. Amer. Stat. Aesoc,, Vol. 34 (1939), pp. 492-SlO 

[61 J. L, Doob, "The limiting distribution of certain statistics," Annals of Math Slat, 
VoL 6 (1936), p. 166. 


SIGNLFICAHCE LEVELS FOR THE RATIO OF THE MEAN SQUARE 
SUCCESSIVE DIFFERENCE TO THE VARIANCE 

By B. I. Hart 

Ballistic Research Laboratory, Aberdeen Proving Oround 

For purposes of practical application in connection with, significance tests a 
tabulation of the argument corresponding to certain percentage points of the 
probability integral is usually more convenient than that of the probability 
integral for equal intervals of the argument. A table of probabilities for the 



Values of for Digereni Lei^els of ^'Significance 

VsJutsoJk^ VAluist^/k Values 0 j k* 


5 

0 fO 03 IC- 
^ IT X 04 t- 

03 0 c4 t- ‘O 

^ ‘ 0 w 

Ci. 

fi /"‘i 6 

32 = 

- 0 * X 

to .X ^ 53 ''I 
I-- »0 -** ro CO 

^ ^ 'lii? 

03 *0 -r- 'JO 7 1 

£ 

ii 

y ^ = y 

a. 

CO CO CO CO 

ro cc cc 

CO 03 

03 03 01 -3* ■'I 

03 01 73 01 03 

N 

-}| 0 CO r-** 
to 03 10 03 

0 J; 
JL Z t- 

12.=2 
7.1- 

i- 1 - 

S S ^ > 

t* Co -r c»" 01 

i:-. 

VJO »o 0 " b- /: 

r, 3 :; rr 12 

t- 3- ^ 0 

a* 

03 03 oi 03 03 

03 '^3 03 

?t 03 

01 03 01 0 3 03 

03 71 73 01 03 

*£1 

O' 

B 

N CO ‘5 03 CO 
X I', tc «J5 to 
UO * 7 < CO 03 

CD to to to to 

25 co co 

If 03 

Co lO 

CO r- ^ -f 3^ 

to to IQ vO tO 

(A I'O ^ 10 ^ 


03 01 03 oi 03 

03 03 03 

03 oi 

oi 01 03 03 03 

03 01 03 OI N 


to t-' »f5 

M< *-t X 10 •-« 
t-. :r X ;r- 0 
-C' -f -r ■-*• »j5 

ggg 

i75 »0 VO 

10 S 

ffiSsSiS 

® X y ;c Ti 
5S —■ ifs 

!? s :-5s Si fi 

a, 

t-^ rH r^ »-^ rH 

^ P-I ^ 




0 

t\ 

C5 ^0 0 * 03 

to t'- tp «o *0 
-r Ki 3 1 - X 

04 03 01 03 03 

^ 0 '**• 

03 CO CO 

W§ 

cocc 


IQ 73 0 5- 

S i 2 7 X S 

■cr" l»'*^ p** -TOi e*^ 


9^ 


»— < 

rW t<*4 9^ 

^ ^ ^ ^ ^ 

i 

1 

B 

lO 10 05 X CO 
<~i -T CO tit 0 

oisss 

fesS s ^ 

t-* X c; 0 I-'* 
0 00 — 

03 CO -r -r VO 

i> h- r» *7* 
vQ cc ^ r- -r 
X X 2 

a. 

t— 1 -r^-t *—< 

^ r-^ ,-• 



_ ^ ^ ^ 

C 

»-< 03 « *1? »0 

05 CO 05 COW 

f§ m sS 

CR 0 
CO -r 

r-t CO to 

*»t< -0 n* f 

0 r- X iT. 0 
-r -r vQ 


m 


= JCt- 

ro Hi 

0 S ^ S F? 
S i Cc X cc 

03 03 03* 

03 O'! 73 01 03 

OI Xvn 

fell 

’'Tirt 3— Q -14 
03 to 0 UO C5 
15 35 03 

<0 tp to tp 0 

c-l IT! I^i 

03 73 03 03 71 

2;S2 

9nft 9.^ 

Bmm 

*i* -I* -T* •1' -r 

03 73 03 

03 71 01 03 73 


W M 




immni 


- 53 to *M C5 lO *> 
II -jicOi-tCT'OOt't 


sisisf 




'g 
' ^ 

tfS 

■*r 

2 

»>ip4 

75 

Vft 01 

CO CO 

1 

S 


5“* 


m 

l—t 1^4 


1 CC 

CO 

CO 

CO 

fO CO 

fO 

CO 

CO 

CO 

CO 

CO CO 

CO 

IX 

' 

fe 

s 

>h-4 

lA 

ctg 

71 

to 

3 

GS 

X 


CO I--. 
It t- 

to 

1—* 

11—< 

1 C>1 

t- 

«w4 

;=3 

§§ 

3 

0 

i 

as 


CO •—< 

ov 

s 

i CO 

CO 

CO 

CO 

CO CO 

rc 

CO 

73 

71 

73 

<73 71 

73 


S S5^222?gSg F:E?3 i 2K? ‘5S^S5SS SSiS^S 

03p0^5*5o «5cycr-5C'^ *-<c:sh.io-^ ^"-HCiaor^ 

II 53 03f*-w:>*^coc^ .-irrHOOOJ o^occoooD ocr^r^i'^i> r^t^<ou5?o 

^ CO ec CO CO ffO CO CO CO CO CO 53 ca 03 O-i M IN 03 51 03 03 03 03 03 03 


II 38SS522; 


SS ?3cS 21 

SSSSS5S 


sgsissKSffi SI 

II !5 ri L'S P 5^ 4^ ? 


S lO CO t- o-i! 


l^i 

51£ c 

5 ( 


;gg5§ 

^ CO cog CO 


*r 03 cc co¬ 
ol w; <5 to t 

S^SBS 


□RcicS^oi eo’o'wsvftco 

CO CO -t* -t* -Tt* -r ■rt' -rf( 




M g S g co o bi« 
II »o -I* 3c5 

fti .. 


Ol CO JN 03 O 52 


mu 

CO CO 


^giil jIIIw 

5-1-i^ % ^ X 25 CJ 

« ’O^irDcot-Qooo i-ic^ooM*io <oi>ooc5Ci »^oieo-t<»o cOt^gOOiO 

r-i t-HiM»—Ir-li—i *-4t-Hi—«,-«^ 03 03 O? 74 (N 03 03W2NCO 


> 10 05 if: 


446 






CORREOl'ION 


447 


ratio of the moan Miuaro hiiocosivp difference to the variance s^, p( - <U = 


I 


uiS^i) where is the distribution of sV^V baa been published 

recently^ with k iw argument. The following table of values of for P = 
iX)l, .01 and .05 baa been eomputCHl from it by interpolation. 

Hineo the distrilnitinn of iV«*» wfsVs*)) w aymmeW about 

< k] - > k') if ~k = k'- whore PlaV/) = 

2n/{n - 1).® The upper levels are rarely of practical use, since large values 
of the ratio, could arise only from a aomewhat artificial set of observations, 
such as alternately high and low values of the observed variable. 

The computation of this table of .significance levels was made at the sugges¬ 
tion of Lt. C'ol. L. E. Simon. 


‘ For (letiTtnination of «(P/V) cf. Joh.n von Neumann, “Distribution of the ratio of the 
tncim KC|iiare hik'Cchmvc difTorcncc to the variance,’’ Annak of Math Stal , Vol 12 (1941), 
pp, :ifi7 Ittir). 

s H 1. IIaht, "Tiihulation of the probabilities for the .atio of the mean square successive 
(liffcri'iit'e to the varianec,'' Aniuik o/Malk Vol, 13 (1942) p. 213. 

* T/ic. ('ll ‘ ji, 372 for proof of symmetry and evaluation of £({*/«*) 


A CORRECTION 

By M. a. OiitsHicK 

U. S. Department of Agriculture, Washington 

In my article “Notes on the Distribution of Roots of a Polynomial with 
Eaiulom Complex Coefficierits*’ which appeared in the June 1942 issue of the 

n n 

Annak of lUalliematieal Hlalistics, the symbol 2 formulas (13), (14), 

p«l ^“p-i-1 

n n 

and (15) should be replaced by H 11 • 



REPORT OF THE POUGHKEEPSIE MEETING OF THE INSTITUTE 

The Fifth Summer Meeting of the Institute of Mathematical Statistics was 
held at Vassar College, Tuesday and Wednesday, September 8~9, 1942, in 
conjunction w’ith the meetings of the American hlathernatical Society and the 
hlathematical Association of America. The following fifty-eight members of 
the Institute attended the meeting: 

K. il. Arnold, L. A. .\roinn, K. J. Arrow, Walter Hartley, Felix Bernstein, C. 1. BHhh, 
A. H. Bowker, J. H. Bushey, Belle Calderon, B. II. Camp, (t Cohen, Jr., A. II Copeland, 
C. C. Craig, J H. C'.urliBR, W. E. Doming, J. L. Doob, M, L. Elvebaok, Willy Feller, M. M. 
Flood, R. M. Foster, H. A. Freeman, T. N. E, Grevillc, C. C. Grove, E. J. Gumbel, Edward 
Helly, G. M. Hopper, Harold Hotelling, Dunham Jackson, R. E. Jolliffe, Irving Kaplansky, 
Karl Karsten, B F, Kimball, Howard Levene, Eugene Lukacs, 11. B. Mann, E. B. Mode, 
E. C. Alolina, F, C. Mosteller, C. R. Mummery, M, L Xorden, E. G. Olds, Oysteiu Ore, 
Edward Paulson, Selby Robinson, F. E. Satterthwaitc, Henry Sehefhi, L. E. Simon, Morti¬ 
mer Spiegolman, Arthur Stein, J. R. Tomlinson, A W. Tucker, J. W. Tukey, D. P. Votaw, 
Jr , Abraham Wald, S, S. Wilks, E. W. Wilson, Jacob Wolfowits, L, C. Young. 

The opening seeaion, on Tueaday afternoon, w’aa devoted to contributed 
papera on Probabilily and Slaluiics and was hold jointly with the American 
Mathematical Society, The Chairman waa Profeasor Cecil C. Craig, Uni¬ 
versity of Michigan, and the following papers were presented: 

1, On Ike Thcoru of Testing Composite Hypotheses IFitA One Conslroinl. 

Henry Sohoff^, Princeton University. 

2, On the Consistency of a Class of Non~paramelric Stalislics, 

Jacob Wolfowite, Staten Island, N. Y. 

3, Graphical Controls Based on Serial Numbers, 

E. J, Gumbel, New School for Social Research, 

4, Significance Tests for Multivariate Distributions. 

D. 8, Villata, United States Rubber Company. (Introduced by E. G. Olds.) 

5, On the Choice of the Number of Class Intervals in the Application of the Chi-square 

Test. 

H. B. Mann and Abraham Wald, Columbia University. 

6 Oeneraltzed Poisson Distribution. 

F. E. Satterthwaite, Aetna Life Insurance Company, 

7. The Relalionehip of Fisher's z Distribution to Student's T Distritulion. 

Leo A. Aroian, Hunter College, 

8. On a Statistical Problem Arising in the Classification of an Individual In One of Txoo 

Groups. 

Abraham Wald, Columbia University. 

9. Modern Slalislical Methods in Penology. 

Saly II. R, Struik, Radolitfe College. 

Miriam van Waters, Framingham, Mass. 

10. Regularity of Label-sequences Under Configuration Transformations. 

T. N. E. Greville, Bureau of the Census. 

By Title; 

On the Ratio of the Variances of Two Normal Populations. 

Henry SoheffS, Princeton University. 

Abstracts of these papers follow this report. 

448 



MKCTIN-0 


449 


On Ww!nt“Hfty morniisK I*rr»fcfW>r Harultl Ilntcllirig, Columbia University, 
(Ujtd ftk {’Inurma!! U a J.n S(>>rhnnhr Pmrsm. The following papers 

were press»’nt«'«l: 

1. «3n«f Kf-rvrrfn^^' 

A H f •.tjsrUtuS, i MV«»r‘»i»y ftf MK'l.iKftn 

2 ‘jtui Ptnritfnl 

WiHy FpHfr, Brown Univprftily 

3. (ttnff'tl Thfncjj anJ .4 h> I'fcyoira. 

J T. iJirtb, rnivrrfltly of lllin-if 

Tlve jipiMun nn Wr-rlnoMlay {tftj'rntnm wai, held jointly with the American 
Mathematical ttfK‘ir'ty. Lt. C Vd. I^-aJic Iv. Simon, U. S. A,, served aa Chairman, 
and the following jyHr***rH on Tfc Apjdirahiliiy of MaUn^malical Statistics to TTar- 
Efforts Virro pr«nt<'d: 

I, .SVcivt Hfftrenff io tht Problem of Tolerance lAmils 

8. 8 t'otvpr«tly 

U,M< truanf J li ('•ir!iK"i. t’ofnrll t"mv««mty. 

2 (}n ihf XoitiK m/ .SioOafira m QurUiiy Control. 

W KilwHot" tVwing. B'lfrijii of lUr tVnuua. 

/hitt-mnant Wntirr tiarthy, f’ruv<>rii.iiy t(f ChicBRO. 

A mw'ting of tin* Btmrd rtf I hrtftorir waa held on Tuesday evening. Following 
the joint dinner on Witlne^rlay evening, a concert waa given in Skinner Hall by 
memberii of the muf<ic department rrf Yaiwr thdlege. 

Edwin G, Odds, 

Secretary 



ABSTRACTS OF PAPERS 

(Presentod on .Scpt(‘iiilici H, 1042, ut the Potipjhkwp'^itt nifvtinp; of the Institute) 

On the Theory of Testing Composite Hypotheses with One Constraint. IlKNity 
ScHKFFfi, Priiicctnii rT)iv('i>ijy. 

A composite hj-puthcHis \Mtli lUic ronHtrmnt Kpi-eities tlio viiliie of oiio and only one 
parameter of a set oceiirring in :i diHtnlmtion funetiori, Tin* I henry of teslingHneh liyjnilh- 
esis ia not only of direct iutereat for many iinportanl iirohleinH, but ia iutirmitely related 
to Xeyraan’s thenry of confideiiee intervala (PAil Tnim. Hoy. Hoc, London, lil.'17). A 
method of Neymun (Bull. Hoc. Math France, lOltSI for findiriK type 11 regions for testing 
these hypotheses la extended to the euae of any mimher of ntiiaanee panimetera Type lb 
regions are defined hy generalizing the type --li regiims of Xeytnnn and Pearson (jS7o(. Hes. 
Mem., 1930) to the casi* where uuisance paranieiers are inesent, and auflieient eoiiditions 
are found that a type H legion he alsti of type lti. An interesting moment problem in 
encountetccl, in which the admissible funclioiia are not of eonstant sign, and is solved for 
the case where the origitinl distrihulion is multivariate norninl 


On the Consistency of a Class of Non-Parametric Statistics. J, Wolfowit/, 
X. Y. Ciity. 


Let A' and }' he two atoehtiatie variables uboul wlnee di.stiibution noHniiK is known 
except tlint tliey are continuous and let if be retpiired to test whether their distriluition 
functions are the anme. Let I' be the olmerved sei|Uenee of zeros lunl ones constriieted ns 
described elsewhere (Wald and Wolfowitz, .'Innnfs of .Math, .S'/ni,, \'ol, ll tltUII), ju IIK). 
Suppose tliat the statislie iS'(l') used to test tlio liypothesis is of the form A’lr) - - iVff,), 
where I, la tlie length of the j-fll run and ipfx) a siiitalile finiclion ilelined for nil pimifivo 
integral x, The notion of eonaiateney, origiimted by Fisher for )iaramclrie prohlems, has 
already been extended to tlie iion-tmrametric ease line, eit., p. iriHi The niitliiir now 
provcH that, aiihject to reaaoniihle conditions onyix) .niid sliiliHtieally uniiuporlaiif restrie- 
tums on the alterniilives to the null hypothesis, statistics of the type .S’fri are ctmsmteiit. 
In purtieular, a atafistie dlaeussed by the author tAiinalii of .][alh. Slat. Septemher. li)12) 


and for which v^f*) 


c:) 


belonga to tlic class eoveretl hy th(‘ llieoiem. 


Graphical Controls Based on Serial Numbers. 10. J. (ti'MitEh, Xew School 
for Social Rc.st'arch. 

The index m of the observed value x„ (m => 1, 2, ■ n) ia culled its serial mimher. 

value X of a continiioiiB statistical variable defined by a prolialiility H'U) « X is called a 
grade, (e.g. the median for X ■= i). Tlie coordination of serial mirnhiTS willi grades furiiislies 
two graphical metliuds for comparing the cilmecvatKins and the theory, namely the eipii- 
probability test baaed on m « nX, nml tlie reliirii periods liused on m “ nX + 

Starting from the dislribntion of Lilts mth value, we determine the most probable serial 
number “ nX + A, where A depends upon the distribution. For a syiumelrieal dis¬ 
tribution, the corrections A for two grades defined by X anil 1 ~ X. are etuml in absolute 
value and opposite in sign, Tlicn no correction is needed for the, median. For an asym¬ 
metrical distribution, wo calculate the most probable serial number of tlie mode, con¬ 
sidered as an wth value. Thus the mode is obtained from the observations Ihrougli the 
theory. In tins case the mode is not the most precise mtli value 

If m is of the order in, the distribution of the mth value converges towards ii normal 

460 



ABSTRACTS 


451 


ilifitnbiitioii witli an rxprctation given by m. = nir(a;), and a standard deviation s(e)| where 
six) Vn — li’tx) 'n'(x). By attributing to eachtheoretical value iits stand¬ 

ard deviatimi, we ubtuiri iiitervals xdz fl(®) which may be used tor the control of the equi- 
probfibilit.Y teat, the edinjmriMon of tlip observed step (unction with the frequency, and the 
comparison of the oliserved with the theoretical return periods. Besides, the standard 
error of the mth value leads to the precision of the, determination of a constant obtained 
from a grade. 

Significance Tests for Multivariate Distributions. D, S. Villars, U. S. Rubber 
C'orapatiy. 

The observed mean of sets of m varialca, each normally and independently distributed, 
is difllribiiled around the population mean according to a x* distribution with m degrees 
of freedom The sum of Kcpiarea at deviations of n observed points from the observed mean 
is distributed aa x’ witli m(n — 1) degrees of freedom (not with n - 1) A much more 
powerful test for correlation Ilian that by the correlation coefficient is described, which for 
liivaniile distributions, involves comparisons between n — 1 and n — 1 degrees of freedom 
Tliis can be extended to m - I tests with m variates Distribution of distance between two 
menim ami disiriimtum of rulucml radius m worked out in detail for two variates 


On the Choice of the Number of Class Intervals in the Application of the Chi- 
Square Test. ir. R Mann and A. Waud, Columbia University. 

The distance of two distriViution functions is defined as the l.u.b. of the absolute value 
of the differeiiee iietweeii the two cumulative distribution functions. Let C(A) be the class 
of alternatives with distance >A from the null-hypothesis. Let/(jV, fc, A) be the g.l.b. 
of the iiower witli respect to altcrnativos in Cf A) of the chi-square test with sample size N 
and it’ equally tirolmidc class intervals. A positive integer k is called best with respect to 
aam[»le. size .V if I here e.xiHtH a A such that /(At, k, A) ■= } and J {N , fc', A)^< } ^r every 

Iiositive integer fc'. The authors show that fcjv = where e dx 

is equal to the sue of tlie, eritieal region, fulfills appro-ximately tho conditions of a best fc 


5 

k/f 


with A,v « ,'■ " "T eorrcspondingvaluc of A. The approximation is shown to be 

'■ fc’A 


Hudsfaetory fin ,V ^ 450 if the 5% level of significance is used and for Af ^ 300 if the 1% 
level is used. 


Generalized Poisson Distribution. F. E. Satterthwaite, Aetna Life Insur¬ 


ance Company. 

In this paper (he Poisson distribution is generalized to allow for the assignment of 
varying vveiglits to a set of events when the number of events follows the Poisson law 
The development.used brings out the fact that distributions falling in 
remiire tbal the underlying atalistics be homogeneous. The only requirement is that they 
[a. iiiiiniemicrit. Formulas are given for the moments of the generalized d.stnbution as 
funetions of tlm moments of the underlying distribution of weights. le pnncip es o 
be observed in the soUUion of practical problems arc outlined. 


The Relationship of Fisher's z Distribution to Student’s t Distribution. Leo 

A. Aroian, llimtcr College. _ 

, «. • .1 1 w ^ . / N -g distributed as Student’s t with N de- 

For )i, and n, aufTiciently largo ^ * 'S mswiouieu 



462 


ABSTItVCTS 


1/1 

grees of freedom, fV •» m + nj ~ 1, -I — + 

z\ni 

Student’s distribution, the level of significance for 2 will be 


')■ 

nj/ 


If the level of sigtiiRcance is a. for 


SI 




<a. Asa 

corollary it follows that the distribution of z approaches normality, Ui, nj «>, with mean 
sero and vannnee -f ^ ). This simpliRes a previous proof of the author. Application 


2\ni Ml/ 


of this result is made to finding levels of significance of the z distribution. On the whole 
R. A. Fisher's formulas for this purpose, m and nj large, as modified by W. G. Cochran are 
superior. The results given by the Fischor-Cochran formulas are compared with those 
obtained by using the formula recently found by E. Paulson. 


Ou a Statistical Problem Arising in the Classification of an Individual in One of 
Two Groups. Abraham Wald, Columbia University. 


Let Ti and vj be two p-variate normal populations which have a common covariance 
matrix, A sample of sizes is drawn from the population Ti(i ■■ 1, 2). Denote by z,„ 
the a-th observation on the I'th variate in iri, and by the /Sth observation on the t'th 
variate in in . Let xi(i —I, ■ • ■ , p) bo a single observation on the t'th variate drawn from a 
population T where it is known that *■ is equal cither to m or to in . The parameters of 
the populations vi and t» are assumed to be unknown. It is shown that for testing the 
hypothesis r ■« n a proper critical region is given by 1/ ^ d where V » XSsWz.fJ/ ~ i,), 
11 a‘' II » 11 »u 1|-S HI “ E J(x„-*.)(xy.^ /(V.s - ~ Pi)]nN, + Nt -2), 

a{« (22 Hi « yvi()/Afi and d is a constant. The largo sample distribution 


of U is derived and it is shown that <7 is a simple function of three angles in the sample 
space whose exact joint sampling distribution is derived. 


Modem Statistical Methods in Penology. Salt R. R. Struik, Radcliffe 
College and Miriam van Waters, Massachusetts Reformatory for Women. 

In applying statistical methods to penological problems, so far the host known studies 
have considered 100, 600, or once in England (to refute Lombroso's theory) 1800 cases. 
But from the correct statistical standpoint, far more cases are needed to establish a law. 
Over a period of years, an attempt has been made to use statistical methods in the study 
of penological pTOyLCUi® ‘R the Massachusetts Reformatory for Women, but the results 
will take.,i^ real sigrilficanoe and be conclusive only when similar investigations are made 
all ovey,4he TJnit«c(Stages. 

Regularity oPlabjbl-Soquencea Under Configuration Transformatlona. T. N. E, 
GREViLti), Bureau of the Ceusua, 

There is developed a class of transformations on sequences of arbitrary labels in terms 
of which a wide variety of problems in the theory of probability can be formulated. It is 
shown that, with mild restrictions on tho transformations used and on the measure funotion 
assumed on the label-space, almost every labol-sequonoe produces a transform having the 
frequency distribution oxpootod, Tho class of transformations considered is shown to 
include as special cases the four fundamental operations of von Mises: place selection, 
partition, mixing, and combination. 





453 


On the Ratio of the Variances of Two Normal Popolationi. IIjmst Kf utr?*, 
Princeton Tniverpity. 

IjCtfflwlhpaUnvrrttSix Th'- *!■».. 'hsf f sp*? w 

and compAriBtm of <il Biftnifs'"'**'''’" J-r ‘.V },vj. .lijrKi- » '• •r.fsdrnro 

"ntervftlsfor 9. Thrpsitf-r if' 4>'!»s4<'4 »i5< t "w-t j'sur).* •* r Sit-* =« ^ “'l*' vri 

and only aolulitiM bawrd <*o 5h'‘ •?' oit< r..i,«,d<Tf * Ynt,™,® 

roarlic® aix f'***!* ®nd r»ff«'»p‘»t>4ma .4 er.tiivirn'r *'*f’ jr)4r'wh»r.»4 If fi<rn« 

out that the timita on •h*' F‘dj®’nf whuK -.•.l-f ?ii. 4 »*'*' *4.*’ *«»«' m 

which yield cntilidcnc'* in «) r* r'an^ .ni’Jiir* '■ of 1 i"' vsI'k* «i( Jlwpa 

litnits arc diffiruH in romjrmT ®H'I o" mr .{iv* ftrr *'* sl»«'!miwot 

efficiency in uaina snarei^d 4?n' rwiK V'ina? Jatl®" iirinta Tit* pr-ierttt^ pa*} nf 

the paper is enneefne-d wot *1.'' «-,< Jt-h’, rrgt'ina and ty{w» fl, 

region*, and the application '•< J^pyroarj's thc^-ra «4 f, f4t.j>-j(r*- !ri“*'fvai» %“» arw t,r 
confidence intcrvala not already r^mdered m j-sr" J »ti>' tmt tfi'-*** prrviowiy 

judged best of a very nafrow rlarsi* arr ki,* ah •'nts u*) »• J-K’if <d «11 I rMwd r*n «tini!ar 
region* of the anme, mtr. 



THE ANNAl^ 

OE MATHEMATICAI- STATISTICS 


SftBtlfiWl JIT 

a a WILKS, 

A. T. CRAIG ^ J. Wm-IAK 


B. €, Ca»¥W 
B. Ctu»te 
W. E. Hbminu 

G. 


WITH •nt% c*tt^PESU’JflO«¥ m 

R, A. FrfflfifJB 
T. C. Ffev 
H. l!c»%iiiJutKtii 


R. vt>is Md»w ' 

K. S. P*?A 8 M 0 Nf 

H. U 

W. A. S»JSWBA«T 


Tfeii AJWAf* (*» MAtttSWA’WCAi. SfAWftcs fei pwhUsbffd quarterly by tlie 
luatlfcBtta ef MatVieamtiesil M4. Royal &• Outlford Awm., Baltinnoro, 

MA. SttluciripRotM, renewala, orders for Imclt wimbeta and other bu»i;tjes« eont- 
Hntnimtioim idiould \m eettt to tim Amiu^ at? MAT»EWA*rt<^At« iTATiatica, Mt 
Boy^ & GoRfordi Ave»., Ifsdtlmore^ Md., or to the ^retairy of the Inali^ 

. tofiit of Aletl«»iAtW BiHikt-kn, K G. OhB, CMWOJtie In?»titufe of Tf«*hiiolo?;y, 
HtLdmtuih, P». Cheoj^ jbi.inidlilig whMi are to IwieoiiHO effeetive fop ■ 

. A ipiievabtaild be reporbjd fo Rio See*^tarj‘ m h**" Wow the Ifltb of tt»B 
ijuiiitb (jn^dying ih«» hma. The oiOotbs of iMs««e are Mawh, 

JfOOOt September and IW^W. > 

■. Idttuusorlpts fm or MAT«r»iATwrAi. RpA-nwtca 

' 4wl4seot ho 8,8. Wtlka, Vmn Hall, IhrinWOK, New .fet»ey. Mantasetipta 
> be dodWofpSMsoil wHh wi# wrrgioe, ood the original eopy 

^ HAilMinltltad; FWooW dtoitlld be tedtHW to e minmtltni mi Wheoever 

pt^tde ijoplwsed by* o bibliogmpby at tl»e end of the papisp; fowittiW *» f«Ot- 
tkAd. dhoiltd bo avoided. Flgows, oimrte, and diogriktiiH sbuutd bo drown on 
plain papop or tidrb^s etotb in btaok India Ink twtoo the aW tliay am to 
bo printod. Adtboto fdo roqaeoted to fctop ip mind iypoipmpbWI dlfltelfe 
ol «mt#e»tod toatbon^ 

Attibom wRl odlbmt% qidy .gallop, ppoofe. ilifty roprinfe ydthout 

wBi boltttoiibed free. Ad^tWid roj^nfe* end eowr® fttiwolW,*ds 

, Tfek BobsodpRob prioB tw #0 /kHUMs i» ^-OC |w» yonr.^ 8io#o OOpleo $1.50. 
BoeEnpi!^^ per Volomo, or fS^AO tee. 



:/'"•/ \'Ooiieroiwi)!'Aim■' '" / 

'./'i /'r ,/pAfjmXir 








